#training

0 blog · 0 news · 10 wiki

§03

Wiki

Training data generated by another model rather than (or in addition to) real-world data.

An alignment technique that trains a reward model from human preferences and then optimises the LLM against it.

A fine-tuning technique that trains only small low-rank matrices instead of every weight, dramatically cutting memory.

An RLHF alternative that directly optimises a model on preference data, skipping the explicit RL loop.

The initial training phase where a model learns general language ability from trillions of tokens of generic data.

A training objective where the model learns to predict tokens that have been masked out of a sentence.

How much of a model's theoretical peak FLOPs is actually delivered during real training — a key efficiency metric.

The stage after pre-training that turns a raw model into a helpful, safe, instruction-following assistant.

Adapting a pre-trained model to a specific task using smaller, targeted data.

A LoRA variant combined with quantisation that lets you fine-tune 65B models on a single consumer GPU.