top | item 47083498

(no title)

keyle | 9 days ago

Have you noticed how it changes throughout their release cycles?

It's so strange. I feel it myself, using the tools, it's like a day is different from the next in terms of how much thinking a model is going to do.

I'm starting to wonder if a new model isn't just a tweak from another one, make a big deal about it, make thinking stronger, get good reviews on blogs and tweak it back down for cost saving.

Go through these waves. Otherwise, how can you explain that they release new models _on the same day_ within hours of each others?

I think we're all being fooled about these incremental updates. Many people are reporting that the models are worse now than in December. I felt it too for many queries. I understand they're trying to balance cost with response quality but it seems quite erratic and gamified.

discuss

Falimonda|9 days ago

Opus 4.6 overthinks and burns tokens in my experience. I switched back to 4.5 after just the first two tasks.

Why would I want it to "think" more than it apparently needs to with 4.5.

xyzsparetimexyz|9 days ago

I think the thinking mode is a net negative in a significant number of cases. I've had an issue in a file that claude failed to mention in the regular output but thought about and then dismissed out of hand in thinking.