§ AI Wiki / Glossary
One-line definitions, the AI dictionary.
§ Search this category
Search the Wiki →The Hopper-generation GPU launched in 2022 — the standard hardware for frontier model training.
A memory-expanded H100 refresh, optimised for long-context and very large models.
NVIDIA's open-source inference server designed to serve multiple frameworks and hardware backends.
A tool that makes downloading and running LLMs on your own machine as simple as a single command.
The initial training phase where a model learns general language ability from trillions of tokens of generic data.
An open standard format that lets models move between different ML frameworks and runtimes.
The component that plans and coordinates the execution of multiple agents, models, or tools.
The general term for how a model picks the next token from its probability distribution.
An agent capable of operating over long horizons without human intervention, making and acting on its own decisions.
An agent's ability to detect errors or failed steps and automatically recover from them.
A technique in which a model evaluates its own output against explicit criteria to surface errors and weaknesses.
A practical chunking strategy that splits documents on coarse separators first, then progressively finer ones.
A model type that generates the next token step-by-step, conditioned on previous tokens.
A technique that manages the KV cache like virtual memory pages, eliminating waste and fragmentation.
The process of splitting documents into meaningful, bounded-size pieces for RAG.
An open-source PostgreSQL extension that adds vector data types and similarity search.
A fully managed, scalable vector-database SaaS.
An agent's ability to design and order the steps it will take to reach a goal.
The input text given to an LLM that conditions the response.
The discipline of systematically designing prompts to get the intended output from an LLM.
XML-like tags used inside a prompt to delimit different sections.
A high-performance open-source vector database written in Rust.
A LoRA variant combined with quantisation that lets you fine-tune 65B models on a single consumer GPU.
An agent pattern that interleaves reasoning steps with tool actions in a single thought-action loop.