(no title)
dwohnitmok | 10 days ago
This benchmark doesn't have the latest models from the last two months, but Gemini 3 (with no tools) is already at 1750 - 1800 FIDE, which is approximately probably around 1900 - 2000 USCF (about USCF expert level). This is enough to beat almost everyone at your local chess club.
runarberg|10 days ago
Additionally, how do we know the model isn’t benchmaxxed to eliminate illegal moves.
For example, here is the list of games by Gemini-3-pro-preview. In 44 games it preformed 3 illegal moves (if I counted correctly) but won 5 because opponent forfeits due to illegal moves.
https://chessbenchllm.onrender.com/games?page=5&model=gemini...
I suspect the ratings here may be significantly inflated due to a flaw in the methodology.
EDIT: I want to suggest a better methodology here (I am not gonna do it; I really really really don’t care about this technology). Have the LLMs play rated engines and rated humans, the first illegal move forfeits the game (same rules apply to humans).
dwohnitmok|10 days ago
The rest is taken care of by elo. That is they then play each other as well, but it is not really possible for Gemini to have a higher elo than maia with such a small sample size (and such weak other LLMs).
Elo doesn't let you inflate your score by playing low ranked opponents if there are known baselines (rated engines) because the rated engines will promptly crush your elo.
You could add humans into the mix, the benchmark just gets expensive.
emp17344|10 days ago
cesarvarela|10 days ago
dwohnitmok|10 days ago
Whether or not we'll see LLMs continue to get a lower error rate to make up for those orders of magnitude remains to be seen (I could see it go either way in the next two years based on the current rate of progress).
famouswaffles|10 days ago
deadbabe|10 days ago
The correct solution is to have a conventional chess AI as a tool and use the LLM as a front end for humanized output. A software engineer who proposes just doing it all via raw LLM should be fired.
rodiger|10 days ago
The point isn't that LLMs are the best AI architecture for chess.
overgard|10 days ago
jimbokun|10 days ago