(no title)
jorl17 | 12 days ago
It is a far cry from Opus 4.6.
Opus 4.6 was (is!) a giant leap, the largest since Gemini 2.5 pro. Didn't hallucinate anything and produced honestly mind-blowing analyses of the collection as a whole. It was a clear leap forward.
Sonnet 4.6 feels like an evolution of whatever the previous models were doing. It is marginally better in the sense that it seemed to make fewer mistakes or with a lower level of severity, but ultimately it made all the usual mistakes (making things up, saying it'll quote a poem and then quoting another, getting time periods mixed up, etc).
My initial experiments with coding leave the same feeling. It is better than previous similar models, but a long distance away from Opus 4.6. And I've really been spoiled by Opus.
K0balt|12 days ago
renmillar|12 days ago
linolevan|12 days ago
I like seeing this analysis on new model releases, any chance you can aggregate your opinions in one place (instead of the hackernews comment sections for these model releases)?
hypercube33|11 days ago
majora2007|11 days ago
Although I have had it try to debug something and just get stuck chugging tokens.
1broseidon|11 days ago
cube2222|11 days ago
My intuition is this is just related to model size / its "working memory", and will likely neither be fixed by training Sonnet with Opus nor by steadily optimizing its agentic capabilities.
versteegen|11 days ago
Saw something about Sonnet 4.6 having had a greatly increased amount of RL training over 4.5.
jxmesth|11 days ago
zarzavat|11 days ago
For me, OpenAI is ahead in intelligence, and Anthropic is ahead in alignment. I use both but for different tasks.
Given the pace of change, intuition is somewhat of a liability: what's true today may not be true tomorrow. You have to constantly keep an open mind and try new things.
Listening to influencers is a waste of time.
stingraycharles|12 days ago
hesgyrxgh|11 days ago
Valakas_|11 days ago
slopinthebag|12 days ago