It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.
Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.
After reading around, it seems like they probably forgot to update/swap the slides before presentation. The graphs were correct on their website, as they launched. But the ones they used in the presentation were probably some older versions they had forgotten to fix.
haffi112|6 months ago
Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.
danpalmer|6 months ago
Or written by GPT-5?
herval|6 months ago
ozgung|6 months ago
https://imgur.com/a/QkriFco
ileonichwiesz|6 months ago
TrackerFF|6 months ago
rrrrrrrrrrrryan|6 months ago
moritzwarhier|6 months ago
silverquiet|6 months ago
Sateeshm|6 months ago
lysecret|6 months ago
bufferoverflow|6 months ago
[deleted]
artemonster|6 months ago
dang|6 months ago
You may not owe people who you feel are idiots better, but you owe this community better if you're participating in it.
https://news.ycombinator.com/newsguidelines.html