(no title)
jampa | 1 month ago
It's screwing up even in very simple rebases. I got a bug where a value wasn't being retrieved correctly, and Claude's solution was to create an endpoint and use an HTTP GET from within the same back-end! Now it feels worse than Sonnet.
All the engineers I asked today have said the same thing. Something is not right.
eterm|1 month ago
A model or new model version X is released, everyone is really impressed.
3 months later, "Did they nerf X?"
It's been this way since the original chatGPT release.
The answer is typically no, it's just your expectations have risen. What was previously mind-blowing improvement is now expected, and any mis-steps feel amplified.
quentindanjou|1 month ago
What we need is an open and independent way of testing LLMs and stricter regulation on the disclosure of a product change when it is paid under a subscription or prepaid plan.
jampa|1 month ago
This is not the same thing as a "omg vibes are off", it's reproducible, I am using the same prompts and files, and getting way worse results than any other model.
mrguyorama|1 month ago
If LLMs have a 90% chance of working, there will be some who have only success and some who have only failure.
People are really failing to understand the probabilistic nature of all of this.
"You have a radically different experience with the same model" is perfectly possible with less than hundreds of thousands of interactions, even when you both interact in comparable ways.
olao99|1 month ago
spike021|1 month ago
unknown|1 month ago
[deleted]
F7F7F7|1 month ago
I’m a Max x20 model who had to stop using it this week. Opus was regularly failing on the most basic things.
I regularly use the front end skill to pass mockups and Opus was always pixel perfect. This last week it seemed like the skill had no effect.
I don’t think they are purposely nerfing it but they are definitely using us as guinea pigs. Quantized model? The next Sonnet? The next Haiku? New tokenizing strategies?
ryanar|1 month ago
I used this command with sonnet 4.5 too and have never had a problem until this week. Something changed either in the harness or model. This is not just vibes. Workflows I have run hundreds of times have stopped working with Opus 4.5
kachapopopow|1 month ago
hirako2000|1 month ago
Or maybe when usage is high they tweak a setting that use cache when it shouldn't.
For all we know they do whatever experiment the want, to demonstrate theoretical better margin, to analyse user patterns when a performance drop occur.
Given what is done in other industries which don't face an existential issue, it wouldn't surprise me some whistle blowers in a few years tell us what's been going on.
root_axis|1 month ago
measurablefunc|1 month ago
landl0rd|1 month ago
An upcoming IPO increases pressure to make financials look prettier.
boringg|1 month ago
epolanski|1 month ago
In fact as my prompts and documents get better it seems it does increasingly better.
Still, it can't replace a human, I really need to correct it at all, and if I try to one shot a feature I always end up spending more time refactoring it few days later.
Still, it's a huge boost to productivity, but the time it can take over without detailed info and oversight is far away.
cap11235|1 month ago