Autoregressive Model

Autoregressive models generate a sequence one element at a time, each new step conditioned on what came before — and they make up the vast majority of generative LLMs in use today. The GPT family, Claude, Llama 3, Gemini all learn the distribution of the next Token given the previous ones, then realise it via Sampling. This recipe yields powerful language modelling, but the 'one token at a time' pattern makes inference fundamentally sequential — which is exactly why techniques like Speculative Decoding, KV Cache and Continuous Batching exist. Compared to parallel-generation architectures like Diffusion, the autoregressive approach maps especially naturally onto the sequential nature of language.