Synthetic data is training data produced by another model instead of being collected from the real world. Microsoft's Phi series, Anthropic's Constitutional AI pipeline, and recent OpenAI models all rely on it heavily. Used well, it can rapidly improve quality and reasoning patterns; used poorly, it can cause the model collapse pattern where quality degrades. With the rise of Reasoning Models, generating step-by-step solution traces has become an especially important use case for synthetic data.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2023
Synthetic Data
Training data generated by another model rather than (or in addition to) real-world data.
- EN — English term
- Synthetic Data
- TR — Turkish term
- Sentetik Veri