RLAIF — RL from AI Feedback

RLAIF keeps the core idea of RLHF but replaces human labellers with another LLM as the source of preference signals. Anthropic systematised the approach in its 2022 Constitutional AI work, where a model was given a 'constitution' and asked to critique and revise its own outputs against those principles, producing preference data that fed the subsequent RL phase. The upside is that it scales far more cheaply than human labelling; the risk is that model biases and blind spots can be self-reinforced. Most modern post-training pipelines now blend human and AI feedback rather than picking one or the other.