top | item 44827548 (no title) z7 | 6 months ago GPT-5 is #1 on WebDev Arena with +75 pts over Gemini 2.5 Pro and +100 pts over Claude Opus 4:https://lmarena.ai/leaderboard discuss order hn newest virgildotcodes|6 months ago This same leaderboard lists a bunch of models, including 4o, beating out Opus 4, which seems off. afro88|6 months ago In my experience Opus 4 isn't as good for day to day coding tasks as Sonnet 4. It's better as a planner zamadatix|6 months ago "+100 points" sounds like a lot until you do the ELO math and see that means 1 out of 3 people still preferred Claud Opus 4's response. Remember 1 out of 2 would place the models dead even. degrews|6 months ago That eval hasn't been relevant for a while now. Performance there just doesn't seem to correlate well with real-world performance. Too|6 months ago What does +75 arbitrary points mean in practice? Can we come up with units that relate to something in the real world.
virgildotcodes|6 months ago This same leaderboard lists a bunch of models, including 4o, beating out Opus 4, which seems off. afro88|6 months ago In my experience Opus 4 isn't as good for day to day coding tasks as Sonnet 4. It's better as a planner
afro88|6 months ago In my experience Opus 4 isn't as good for day to day coding tasks as Sonnet 4. It's better as a planner
zamadatix|6 months ago "+100 points" sounds like a lot until you do the ELO math and see that means 1 out of 3 people still preferred Claud Opus 4's response. Remember 1 out of 2 would place the models dead even.
degrews|6 months ago That eval hasn't been relevant for a while now. Performance there just doesn't seem to correlate well with real-world performance.
Too|6 months ago What does +75 arbitrary points mean in practice? Can we come up with units that relate to something in the real world.
virgildotcodes|6 months ago
afro88|6 months ago
zamadatix|6 months ago
degrews|6 months ago
Too|6 months ago