top | item 44827304

(no title)

tylermw | 6 months ago

What's going on with this plot's y-axis?

https://bsky.app/profile/tylermw.com/post/3lvtac5hues2n

discuss

haffi112|6 months ago

It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.

Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.

danpalmer|6 months ago

> like the presentation is rushed or made last minute

Or written by GPT-5?

herval|6 months ago

They never compare with other vendors

ozgung|6 months ago

Also this coding deception rate bar tries to decieve us.

https://imgur.com/a/QkriFco

ileonichwiesz|6 months ago

It’s beyond parody that they did something like this on a slide about deception. You couldn’t make this stuff up.

TrackerFF|6 months ago

After reading around, it seems like they probably forgot to update/swap the slides before presentation. The graphs were correct on their website, as they launched. But the ones they used in the presentation were probably some older versions they had forgotten to fix.

rrrrrrrrrrrryan|6 months ago

This is hilarious

moritzwarhier|6 months ago

Probably created without thinking enabled. Lower % accuracy ensues, speaking from experience.

silverquiet|6 months ago

Probably generated by AI.

Sateeshm|6 months ago

If not, the person that made the chart just got $1.5M

lysecret|6 months ago

Couldn’t believe it was real haha

bufferoverflow|6 months ago

[deleted]