Chatbot Arena is a public evaluation platform launched in mid-2023 by the Lmsys team at UC Berkeley, where humans submit prompts to two anonymous models and vote on which response they prefer. Results are aggregated into an Elo Rating leaderboard, and millions of votes have made it one of the closest references to real-world preferences over frontier models. As synthetic benchmarks like MMLU saturated, community attention shifted to the Arena. It has limits too: style bias, prompt-type distribution, and the country/user mix all influence outcomes.
External Links