Show HN: Llmfao ā Human-Ranked LLM Leaderboard with Sixty Models
2 points| scoresmoke | 2 years ago |dustalov.github.io
I also wrote a detailed post describing the methodology and analysis: https://evalovernite.substack.com/p/llmfao-human-ranking
[1]: https://twitter.com/_jasonwei/status/1707104739346043143
[2]: https://benchmarks.llmonitor.com/
Unfortunately, I did my analysis before the Mistral AI model was released, but published it after the model was released. Iād be happy to add it to the comparison if I had their completions.
maxrmk|2 years ago
scoresmoke|2 years ago
The only manual analysis was when I checked the passed/failed prompts of the top-performing model.