#benchmark

0 blog · 1 news · 1 wiki

§02

News

April 28, 2026AI Mevzuları Editör

Can AI Think? The "Human-Like" Claim About the Centaur Model Comes Under Fire

A Zhejiang University study finds that the Centaur model's 160-task performance likely stems from overfitting; the "pick option A" test exposes pattern matching, not understanding.

ai araştirma

§03

Wiki

§01Glossary

MMLU

A broad multiple-choice benchmark that tests knowledge and reasoning across 57 subjects.

EN: MMLU
TR: MMLU