MEVZU N° TAG / VOL. 160
#training
0 blog · 0 news · 10 wiki
Wiki
Synthetic Data
Training data generated by another model rather than (or in addition to) real-world data.
- EN
- Synthetic Data
- TR
- Sentetik Veri
RLHF — Reinforcement Learning from Human Feedback
An alignment technique that trains a reward model from human preferences and then optimises the LLM against it.
- EN
- RLHF (Reinforcement Learning from Human Feedback)
- TR
- RLHF — İnsan Geri Bildirimiyle Pekiştirmeli Öğrenme
LoRA (Low-Rank Adaptation)
A fine-tuning technique that trains only small low-rank matrices instead of every weight, dramatically cutting memory.
- EN
- LoRA (Low-Rank Adaptation)
- TR
- LoRA (Düşük-Mertebeli Adaptasyon)
DPO — Direct Preference Optimization
An RLHF alternative that directly optimises a model on preference data, skipping the explicit RL loop.
- EN
- DPO (Direct Preference Optimization)
- TR
- DPO — Doğrudan Tercih Optimizasyonu
Pre-training
The initial training phase where a model learns general language ability from trillions of tokens of generic data.
- EN
- Pre-training
- TR
- Ön Eğitim
Masked Language Modeling
A training objective where the model learns to predict tokens that have been masked out of a sentence.
- EN
- Masked Language Modeling
- TR
- Maskeli Dil Modelleme
Model FLOPs Utilization (MFU)
How much of a model's theoretical peak FLOPs is actually delivered during real training — a key efficiency metric.
- EN
- Model FLOPs Utilization (MFU)
- TR
- Model FLOPs Kullanımı (MFU)
Post-training
The stage after pre-training that turns a raw model into a helpful, safe, instruction-following assistant.
- EN
- Post-training
- TR
- Sonrası-Eğitim
Fine-tuning
Adapting a pre-trained model to a specific task using smaller, targeted data.
- EN
- Fine-tuning
- TR
- İnce Ayar (Fine-tuning)
QLoRA
A LoRA variant combined with quantisation that lets you fine-tune 65B models on a single consumer GPU.
- EN
- QLoRA
- TR
- QLoRA