top | item 46902691

(no title)

eaf7e281 | 24 days ago

There's no way they actually work on training this.

discuss

order

fragmede|24 days ago

The people that work at Anthropic are aware of simonw and his test, and people aren't unthinking data-driven machines. How valid his test is or isn't, a better score on it is convincing. If it gets, say, 1,000 people to use Claude Code over Codex, how much would that be worth to Anthropic?

$200 * 1,000 = $200k/month.

I'm not saying they are, but to say that they aren't with such certainty, when money is on the line; unless you have some insider knowledge you'd like to share with the rest of the class, it seems like an questionable conclusion.

margalabargala|24 days ago

I suspect they're training on this.

I asked Opus 4.6 for a pelican riding a recumbent bicycle and got this.

https://i.imgur.com/UvlEBs8.png

WarmWash|24 days ago

It would be way way better if they were benchmaxxing this. The pelican in the image (both images) has arms. Pelicans don't have arms, and a pelican riding a bike would use it's wings.

mrandish|24 days ago

Interesting that it seems better. Maybe something about adding a highly specific yet unusual qualifier focusing attention?

TheDong|24 days ago

I don't think that really proves anything, it's unsurprising that recumbent bicycles are represented less in the training data and so it's less able to produce them.

Try something that's roughly equally popular, like a Turkey riding a Scooter, or a Yak driving a Tractor.

riffraff|24 days ago

perhaps try a penny farthing?

KeplerBoy|24 days ago

There is no way they are not training on this.

collinmanderson|24 days ago

I suspect they have generic SVG drawing that they focus on.