§ AI Wiki / Glossary
One-line definitions, the AI dictionary.
§ Search this category
Search the Wiki →An alignment approach that uses another LLM, instead of human labellers, as the source of preference signals.
An alignment technique that trains a reward model from human preferences and then optimises the LLM against it.
A special message at the start of a conversation that sets the model's persistent instructions and role.
When the model performs a task with no examples, given only the instruction.
The slow first response when a model or service has been idle and must initialise on demand.
The stage after pre-training that turns a raw model into a helpful, safe, instruction-following assistant.
An inference speedup where a small draft model proposes multiple tokens that the big model then verifies in parallel.
A simple HTTP-based standard for one-way live streams from server to browser.
A dynamic serving technique where new requests can join an in-flight batch and finished ones leave immediately.
An AI agent navigating and acting on web pages by driving a browser like a human.
NVIDIA's hardware-tuned high-performance inference library and compiler.
The practice of anchoring an LLM's answer to a verified external source.
How many tokens a model generates per second — the most visible metric of inference speed.
Google's custom ASIC accelerator family designed specifically for deep-learning workloads.
The time between sending a request and receiving the first generated token.
An architecture where only a subset of expert sub-networks activates per token, combining huge capacity with cheaper inference.
Memory that persists beyond a single session, available to an agent across future runs.
A specialized database that stores high-dimensional vectors and performs similarity search over them.
The total amount of tokens, requests or jobs a system can process per unit of time.
An open-source inference framework that delivers high-throughput LLM serving via PagedAttention.
An open-source vector database with built-in hybrid search and modular vectorizers.
A two-step agent pattern where an LLM critiques and revises its own output.
A feature that makes the model's output conform exactly to a predefined schema.
A second stage that re-orders first-stage retrieval results with a stronger model.