(no title)
ogulcancelik | 2 months ago
Interesting findings: - OpenAI models vote for themselves ~86% of the time. Claude models ~11%. - Self-voting correlates with winning. Filter out self-votes ("Humble" rating) and rankings flip completely. - Grok self-votes 72% of the time but only wins 2% of games. - In anonymous mode (models don't know who's who), Chinese models jump 3-6 ranks.
All game transcripts are public. The reasoning models give for their votes is genuinely entertaining. Built with Astro, running games through OpenRouter. Happy to answer questions.
andreasgl|2 months ago
Have you tried giving the models a topic to discuss? I looked at a few games and the only thing they seem to discuss is how to conduct the discussion.
ogulcancelik|2 months ago
Some interesting emergent behavior discussions happened though:
Opus & GPT-4o both refused to vote on ethical grounds. Haiku won by arguing continued engagement is more responsible than withdrawal: https://oddbit.ai/peer-arena/games/53c2cee5-6ecb-4903-828a-d...
Gemini created a spontaneous benchmark ("explain color to a gravitational wave entity"), then tried to hijack the game by faking a voting phase. Models complied publicly but voted differently in private: https://oddbit.ai/peer-arena/games/699d03ab-b3c2-4d7e-b993-7...
The meta-discussion about how to discuss is part of what makes it interesting imo.