Tokens Per Second (TPS)

TPS (tokens per second) measures how many Tokens an LLM produces each second — the most user-visible facet of inference speed. The 'TPS Wars' of 2023-2024, with Groq and Cerebras leading on specialised hardware, turned the metric into one of the industry's headline numbers: where a GPU might serve 50-150 TPS, dedicated accelerators started pushing past 500 for a single user. But TPS doesn't tell the whole story — it should be read alongside TTFT, and per-user TPS is not the same as server-wide Throughput. Streaming UIs (Streaming) let the user feel a high TPS in real time; in classic batch-response interfaces the speedup often disappears behind the loading spinner.