MEVZU N°12808.05.2026ISTANBULYEAR I — VOL. III

MEVZU N° TAG / VOL. 012

#alignment

0 blog · 0 news · 5 wiki

§03

Wiki

05

Alignment

The problem of making an AI system's goals and behavior align with human values and intent.

EN: Alignment
TR: Hizalama (Alignment)

RLHF — Reinforcement Learning from Human Feedback

An alignment technique that trains a reward model from human preferences and then optimises the LLM against it.

EN: RLHF (Reinforcement Learning from Human Feedback)
TR: RLHF — İnsan Geri Bildirimiyle Pekiştirmeli Öğrenme

Constitutional AI

Anthropic's alignment technique where the model critiques and revises its own outputs against a written set of principles.

EN: Constitutional AI
TR: Anayasal Yapay Zeka

DPO — Direct Preference Optimization

An RLHF alternative that directly optimises a model on preference data, skipping the explicit RL loop.

EN: DPO (Direct Preference Optimization)
TR: DPO — Doğrudan Tercih Optimizasyonu

RLAIF — RL from AI Feedback

An alignment approach that uses another LLM, instead of human labellers, as the source of preference signals.

EN: RLAIF (RL from AI Feedback)
TR: RLAIF — AI Geri Bildirimiyle Pekiştirmeli Öğrenme