(no title)
andrewchilds | 12 days ago
I haven't seen a response from the Anthropic team about it.
I can't help but look at Sonnet 4.6 in the same light, and want to stick with 4.5 across the board until this issue is acknowledged and resolved.
andrewchilds | 12 days ago
I haven't seen a response from the Anthropic team about it.
I can't help but look at Sonnet 4.6 in the same light, and want to stick with 4.5 across the board until this issue is acknowledged and resolved.
donovandikaio|4 minutes ago
For now, my workflow will be for everyday tasks claude-opus-4-5 and opus 4.6 for more complex work.
wongarsu|12 days ago
I've overall enjoyed 4.6. On many easy things it thinks less than 4.5, leading to snappier feedback. And 4.6 seems much more comfortable calling tools: it's much more proactive about looking at the git history to understand the history of a bug or feature, or about looking at online documentation for APIs and packages.
A recent claude code update explicitly offered me the option to change the reasoning level from high to medium, and for many people that seems to help with the overthinking. But for my tasks and medium-sized code bases (far beyond hobby but far below legacy enterprise) I've been very happy with the default setting. Or maybe it's about the prompting style, hard to say
evilhackerdude|12 days ago
SatvikBeri|12 days ago
perelin|12 days ago
galaxyLogic|12 days ago
MrCheeze|12 days ago
bjt12345|12 days ago
Opus 4.6 can be quite sassy at times, the other day I asked it if it were "buttering me up" and it candidly responded "Hey you asked me to help you write a report with that conclusion, not appraise it."
KronisLV|12 days ago
DaKevK|12 days ago
Jach|12 days ago
data-ottawa|12 days ago
Go to /models, select opus, and the dim text at the bottom will tell you the reasoning level.
High reasoning is a big difference versus 4.5. 4.6 high uses a lot of tokens for even small tasks, and if you have a large codebase it will fill almost all context then compact often.
minimaxir|12 days ago
_zoltan_|12 days ago
honeycrispy|12 days ago
unknown|12 days ago
[deleted]
Topfi|12 days ago
In either case, there has been an increase between 4.1 and 4.5, as well as now another jump with the release of 4.6. As mentioned, I haven't seen a 5x or 10x increase, a bit below 50% for the same task was the maximum I saw and in general, of more opaque input or when a better approach is possible, I do think using more tokens for a better overall result is the right approach.
In tasks which are well authored and do not contain such deficiencies, I have seen no significant difference in either direction in terms of pure token output numbers. However, with models being what they are and past, hard to reproduce regressions/output quality differences, that additionally only affected a specific subset of users, I cannot make a solid determination.
Regarding Sonnet 4.6, what I noticed is that the reasoning tokens are very different compared to any prior Anthropic models. They start out far more structured, but then consistently turn more verbose akin to a Google model.
weinzierl|12 days ago
(Currently I can use Sonnet 4.5 under More models, so I guess the above was just a glitch)
etothet|12 days ago
hedora|12 days ago
Those suggest opposite things about anthropic’s profit margins.
I’m not convinced 4.6 is much better than 4.5. The big discontinuous breakthroughs seem to be due to how my code and tests are structured, not model bumps.
ctoth|12 days ago
I have a protocol called "foreman protocol" where the main agent only dispatches other agents with prompt files and reads report files from the agents rather than relying on the janky subagent communication mechanisms such as task output.
What this has given me also is a history of what was built and why it was built, because I have a list of prompts that were tasked to the subagents. With Opus 4.5 it would often leave the ... figuring out part? to the agents. In 4.6 it absolutely inserts what it thinks should happen/its idea of the bug/what it believes should be done into the prompt, which often screws up the subagent because it is simply wrong and because it's in the prompt the subagent doesn't actually go look. Opus 4.5 would let the agent figure it out, 4.6 assumes it knows and is wrong
DaKevK|12 days ago
nerdsniper|12 days ago
I just wouldn’t call it a regression for my use case, i’m pretty happy with it.
baq|12 days ago
Snakes3727|12 days ago
However I can honestly say anthropic is pretty terrible about support, to even billing. My org has a large enterprise contract with anthropic and we have been hitting endless rate limits across the entire org. They have never once responded to our issues, or we get the same generic AI response.
So odds of them addressing issues or responding to people feels low.
cjbarber|12 days ago
j45|12 days ago
Put in a different way, I have to keep developing my prompting / context / writing skills at all times, ahead of the curve, before they're needed to be adjusted.
cheema33|12 days ago
Many people say many things. Just because you read it on the Internet, doesn't mean that it is true. Until you have seen hard evidence, take such proclamations with large grains of salt.
OtomotO|12 days ago
No better code, but way longer thinking and way more token usage.
DetroitThrow|12 days ago
Foobar8568|12 days ago
minimaxir|12 days ago
yakbarber|12 days ago
grav|12 days ago
lemonfever|12 days ago
jcims|12 days ago
andrewchilds|12 days ago
Gracana|12 days ago
bsamuels|12 days ago
dakolli|12 days ago
At least in vegas they don't pour gasoline on the cash put into their slot machines.
reed1234|12 days ago
reed1234|12 days ago
I doubt it is a conspiracy.
[1] https://www.anthropic.com/news/claude-opus-4-6
PlatoIsADisease|12 days ago
Sam/OpenAI, Google, and Claude met at a park, everyone left their phones in the car.
They took a walk and said "We are all losing money, if we secretly degrade performance all at the same time, our customers will all switch, but they will all switch at the same time, balancing things... wink wink wink"