§ AI Wiki / Glossary
One-line definitions, the AI dictionary.
§ Search this category
Search the Wiki →The attention-based neural network architecture that underpins virtually every modern LLM.
The smallest unit a language model processes — a word fragment, character, or symbol.
An approach where a model spends more compute at inference time, not just during training, to lift performance.
The competitive period that emerged in 2024 around inference providers competing on tokens per second (TPS).
A large-scale AI model pretrained on broad data that can be adapted to many downstream tasks.
Model responses that contain hateful, harassing, or otherwise harmful content.
Technology that turns written text into natural-sounding speech.
An AI agent navigating and acting on web pages by driving a browser like a human.
NVIDIA's hardware-tuned high-performance inference library and compiler.
The practice of anchoring an LLM's answer to a verified external source.
How many tokens a model generates per second — the most visible metric of inference speed.
Google's custom ASIC accelerator family designed specifically for deep-learning workloads.
The time between sending a request and receiving the first generated token.
The process of converting raw text into a sequence of model-readable tokens.
A sampling strategy that picks the next token from only the K most likely candidates.
A sampling method that draws from the smallest set of candidates whose cumulative probability exceeds P.