top | item 46389219

(no title)

redman25 | 2 months ago

https://www.swebench.com

https://swe-rebench.com

https://livebench.ai/#/

https://eqbench.com/#

https://contextarena.ai/?needles=8

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

https://artificialanalysis.ai/leaderboards/models

https://gorilla.cs.berkeley.edu/leaderboard.html

https://github.com/lechmazur/confabulations

https://dubesor.de/benchtable

https://help.kagi.com/kagi/ai/llm-benchmark.html

https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

discuss

order

Alifatisk|2 months ago

I’d stick to artificial analysis

pylotlight|2 months ago

That has many of its own problems as well.