top | item 42591930

(no title)

deadmutex | 1 year ago

Interesting. On lmsys, Gemini is #1 for coding tasks. How does that compare?

https://lmarena.ai/?leaderboard

discuss

order

nathanasmith|1 year ago

For the lmarena leaderboard to be really useful you need click the "Style Control" button so that it normalizes for LLMs that generate longer answers, etc. that, while humans may find them more stylistically pleasing, and upvote them, the answers often end up being worse. When you do that, o1 comes out on top followed by o1-preview, then Sonnet 3.5, and in fourth place Gemini Preview 1206.

MacsHeadroom|1 year ago

lmsys is a poor judge of coding quality since it is based on ratings from a single generation rather than agentic coding over multiple steps.