MEVZU N° TAG / VOL. 093
0 blog · 0 news · 2 wiki
An eval method that asks which of two models' answers to the same prompt is better.
An evaluation method in which an LLM is used to judge another model's output.