TPS (tokens per second) measures how many Tokens an LLM produces each second — the most user-visible facet of inference speed. The 'TPS Wars' of 2023-2024, with Groq and Cerebras leading on specialised hardware, turned the metric into one of the industry's headline numbers: where a GPU might serve 50-150 TPS, dedicated accelerators started pushing past 500 for a single user. But TPS doesn't tell the whole story — it should be read alongside TTFT, and per-user TPS is not the same as server-wide Throughput. Streaming UIs (Streaming) let the user feel a high TPS in real time; in classic batch-response interfaces the speedup often disappears behind the loading spinner.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Beginner · 2023
Tokens Per Second (TPS)
How many tokens a model generates per second — the most visible metric of inference speed.
- EN — English term
- Tokens Per Second (TPS)
- TR — Turkish term
- Saniyedeki Token (TPS)