MEVZU N°128ISTANBULYEAR I — VOL. III
MEVZU N° TAG / VOL. 012
#alignment
0 blog · 0 news · 5 wiki
§03
05Wiki
§01Glossary
Alignment
The problem of making an AI system's goals and behavior align with human values and intent.
- EN
- Alignment
- TR
- Hizalama (Alignment)
§02Glossary
RLHF — Reinforcement Learning from Human Feedback
An alignment technique that trains a reward model from human preferences and then optimises the LLM against it.
- EN
- RLHF (Reinforcement Learning from Human Feedback)
- TR
- RLHF — İnsan Geri Bildirimiyle Pekiştirmeli Öğrenme
§03Glossary
Constitutional AI
Anthropic's alignment technique where the model critiques and revises its own outputs against a written set of principles.
- EN
- Constitutional AI
- TR
- Anayasal Yapay Zeka
§04Glossary
DPO — Direct Preference Optimization
An RLHF alternative that directly optimises a model on preference data, skipping the explicit RL loop.
- EN
- DPO (Direct Preference Optimization)
- TR
- DPO — Doğrudan Tercih Optimizasyonu
§05Glossary
RLAIF — RL from AI Feedback
An alignment approach that uses another LLM, instead of human labellers, as the source of preference signals.
- EN
- RLAIF (RL from AI Feedback)
- TR
- RLAIF — AI Geri Bildirimiyle Pekiştirmeli Öğrenme