§ AI Wiki / Glossary
One-line definitions, the AI dictionary.
§ Search this category
Search the Wiki →Floating-point operations per second — the classic metric for raw compute power.
An API feature that lets a model invoke a predefined function via structured JSON output.
The time between issuing a request and receiving a result.
A processor that runs massive parallel computations — the workhorse of deep learning.
A RAG variant that extracts a knowledge graph from documents to capture the bigger picture.
An approach that combines keyword and semantic search to improve retrieval quality.
A graph-based algorithm for fast approximate nearest-neighbor search over high-dimensional vectors.
Adapting a pre-trained model to a specific task using smaller, targeted data.
A decoding algorithm that keeps the K most-likely candidate sequences alive in parallel during generation.
An ANN indexing technique that partitions the vector space into clusters to speed up search.
An API setting that guarantees the model's output is valid JSON.
A reference that ties information in an LLM's answer back to its source document.
The in-session memory an agent keeps within its context window — recent turns and intermediate state.
A control layer that keeps an LLM or agent within sanctioned behavior boundaries.
The cache that stores previously computed key/value vectors so the model doesn't recompute them every step.
A stateful, graph-based agent workflow framework from the LangChain team.
Georgi Gerganov's open-source C++ project that made running LLMs locally a practical reality.
A fine-tuning technique that trains only small low-rank matrices instead of every weight, dramatically cutting memory.
A training objective where the model learns to predict tokens that have been masked out of a sentence.
How much of a model's theoretical peak FLOPs is actually delivered during real training — a key efficiency metric.
Apple's open-source ML framework purpose-built for Apple Silicon, with a NumPy-like API.
Representing model weights with lower-precision numbers to save memory and gain speed.
A specialised AI accelerator integrated into phones and laptops to run neural workloads efficiently.
NVIDIA's Ampere-generation GPU launched in 2020, the workhorse of deep learning for years.