Alignment

Alignment is the problem of making an AI system's goals and behavior conform to human values, user intent, and societal norms. Posed theoretically in early writings by Stuart Russell and Nick Bostrom, it became a practical engineering problem after 2017 with the research agendas of OpenAI and Anthropic. Methods like RLHF, DPO, RLAIF, and Constitutional AI are concrete ways to train an LLM's instruction following and value reflection. Alignment isn't a single step but an ongoing process — Evals, Red Teaming, and behavior monitoring are all part of the loop.