Streaming Output

Streaming is the technique of sending an LLM's Tokens to the client as they are produced, rather than waiting for the full response — exactly what gives ChatGPT its familiar 'typing' feel. The user sees the first words after just a TTFT-sized delay, which dissolves the 'slow model' perception almost entirely; the same total response time feels dramatically faster when streamed. Technically it is delivered over SSE or WebSockets, and OpenAI, Anthropic and the other major APIs support it natively. It is now a foundational ingredient of modern LLM UX — a non-streaming chat interface is essentially unshippable in 2026.