top | item 45182714

(no title)

pmx | 5 months ago

They need to focus on fixing reliability first. Their systems constantly go down and it appears they are having to quantise the models to keep up with demand, reducing intelligence significantly. New features like this feel pointless when the underlying model is becoming unusable.

discuss

order

mh-|5 months ago

This can't be understated. I started using it heavily earlier this summer and it felt like magic. Someone signing up now based on how I described my personal experiences with it then would think I was out of my mind. For technical tasks it has been a net negative for me for the last several weeks.

(Speaking of both Claude Code and the desktop app, both Sonnet and Opus >=4, on the Max plan.)

data-ottawa|5 months ago

I don’t think you’re crazy, something is off in their models.

As an example I’ve been using an MCP tool to provide table schemas to Claude for months.

There was a point where it stopped recognizing the tool unless mentioned in early August. Maybe that’s related to their degraded quality issue.

This morning after pulling the correct schema info Sonnet started hallucinating columns (from Shopify’s API docs) and added them to my query.

That’s a use case I’ve been doing daily for months and in the last few weeks has gone from consistent low supervision to flaky and low quality.

I don’t know what’s going on, Sonnet has definitely felt worse, and the timeline matches their status page incident, but it’s definitely not resolved.

Opus 4.1 also feels flaky, it feels like it’s less consistent about recalling earlier prompt details than 4.0.

I personally am frustrated that there’s no refund or anything after a month of degraded performance, and they’ve had a lot of downtime.

pc86|5 months ago

I hesitate to use phrases like "bait and switch" but it seems like every model gets released and is borderline awe-inspiring, then as adoption increases, and load increases, it's like it gets hit in the head with a hammer and is basically useless for anything beyond a multi-step google search.

trunnell|5 months ago

https://status.anthropic.com/incidents/72f99lh1cj2c

They recently resolved two bugs affecting model quality, one of which was in production Aug 5-Sep 4. They also wrote:

  Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs. 
Sibling comments are claiming the opposite, attributing malice where the company itself says it was a screw up. Perhaps we should take Anthropic at its word, and also recognize that model performance will follow a probability distribution even for similar tasks, even without bugs making thing worse.

pqdbr|5 months ago

Same here. Even with Opus in Claude Code I'm getting terrible results, sometimes feeling we went back to the GPT 3.5 eon. And it seems they are implementing heavily token-saving measures: the model does not read context anymore unless you force it to, making up method calls as it goes.

OtomotO|5 months ago

This, so much this...

I signed up for Claude over a week ago and I totally regret it!

Previously I was using it and some ChatGPT here and there (also had a subscription in the past) and I felt like Claude added some more value.

But it's getting so unstable. It generates code, I see it doing that, and then it throws the code away and gives me the previous version of something 1:1 as a new version.

And then I have to waste CO2 to tell it to please don't do that and then sometimes it generates what I want, sometimes it just generates it again, just to throw it away immediately...

This is soooooooo annoying and the reason I canceled my subscription!

yumraj|5 months ago

I had even posted a Ask HN: if people had experienced issues with Claude Code since for me it's slowed down substantially, it'll frequently just pause and take much longer. I have a Claude Max 5X plan.

I've been running ccusage to monitor and my usage in $ terms has dropped to a 1/3 of what it was few weeks ago. While some of it could be due to how I'm using it, but a drop of 60%-70% cannot be attributed to that alone and I think is partly due to the performance.

To add: frequently, as in almost every time: 1) it'll start doing something and will go silent for a long time. 2) pressing esc to interrupt will take a long time to take action since it's probably stuck doing something. Earlier, interrupting via esc used to be almost instantaneous.

So, I still like it, but at my 1/3 drop in measured usage I'm almost tempted to go back to Pro and see if that'll meet my needs.

allisdust|5 months ago

Yup. Opus 4.1 has been feeling like absolute dog shit and it made me give up in frustration several times. They really did downgrade their models. Max plan is a joke now. I'm barely using Pro level tokens since its a net negative on my productivity. Enshittification is now truly in place.

gjvc|5 months ago

"can't be overstated", you mean

bongodongobob|5 months ago

Thanks for the confirmation. Lately it's been telling me it has made edits or written code yet it's nowhere to be seen. It's been messing up extremely simple tasks like "move this knob from the bottom of the screen to the right". Over and over it will insist it made the changes but it hasn't. Getting confused about completely different sections of code and files.

I picked up Claude at the beginning of the summer and have had the same experience.

fuomag9|5 months ago

I felt like the model degraded lately as well, I've been using Claude everyday for months now

probably_wrong|5 months ago

Have you considered perhaps that you are, indeed, out of your mind? Or more precisely, that you could be rationalizing what is essentially a random process?

Based on the discussions here it seems that every model is either about to be great or was great in the past but now is not. Sucks for those of us who are stuck in the now, though.

otabdeveloper4|5 months ago

Congrats, you grew up. It's not Claude's fault.

yazanobeidi|5 months ago

Have you run into the bug where claude acts as if it updated the artifact, but it didn’t? You can see the changes in real time, but then suddenly it’s all deleted character by character as if the backspace was held down, you’re left with the previous version, but claude carries on as if everything is fine. If you point it out, it will acknowledge this, try again, and… same thing. The only reliable fix I’ve seen is to ask it to generate a new artifact with that content and the updates. Talk about wasting tokens, and no refunds, no support, you’re on your own entirely. It’s unclear how they can seriously talk about releasing this feature when there are fundamental issues with their existing artifact creation and editing abilities.

mh-|5 months ago

Yes, just had it happen a couple nights ago with a simple one pager I asked it to generate from some text in a project. It couldn't edit the existing artifact (I could see it being confused as to why the update wasn't taking in the CoT), so it made a new version for every incremental edit. Which of course means there were other changes too, since it was generating from scratch each time.

j45|5 months ago

Yes, this has been happening a lot more the past 8 weeks.

From troubleshooting Claude by reviewing it's performance and digging in multiple times why it did what it did, it seems useful to make sure the first sentence is a clearer and completer instruction instead of breaking it up.

As models optimize resources, prompt engineering seems to become relevant again.

paranoidrobot|5 months ago

Yes, this was so frustrating.

I had to keep prompting it to generate new artifacts all the time.

Thankfuly that is mostly gone with Claude Code.

owenthejumper|5 months ago

Happens all the time. Like right now

srhngpr|5 months ago

I came here to share the exact same thing - this has been happening for weeks now and it is extremely frustrating. Have to constantly tell Claude to rewrite the artifact from scratch or write it from scratch into a new artifact. This needs to be a priority item to fix.

ACCount37|5 months ago

Anthropic claims that they don't degrade models under load, and the performance issues were a result of a system error:

https://status.anthropic.com/incidents/72f99lh1cj2c

That being said, they still have capacity issues on any day of the week that ends in Y. No clue how long would that take to resolve.

fragmede|5 months ago

> Last week, we opened an incident to investigate degraded quality in some Claude model responses. We found two separate issues that we’ve now resolved.

mh-|5 months ago

Not nitpicking, but they said:

> we never intentionally degrade model quality as a result of demand or other factors

Fully giving them the benefit of the doubt, I still think that still allows for a scenario like "we may [switch to quantized models|tune parameters], but our internal testing showed that these interventions didn't materially affect end user experience".

I hate to parse their words in this way, because I don't know how they could have phrased it that closed the door on this concern, but all the anecdata (personal and otherwise) suggests something is happening.

pmx|5 months ago

Frankly, I don't believe their claims that they don't degrade the models. I know we see models as less intelligent as we get used to them and their novelty wears off but I've had to entirely give up on Claude as a coding assistant because it seems to be incapable of following instructions anymore.

siva7|5 months ago

Then check the news again. They already admitted that due to bugs model output was degraded for over a month

furyofantares|5 months ago

Some of this has gotta be people asking more of it than they did before, and some has gotta be people who happened to use it for things it's good at to begin with and are now asking it things it's bad at (not necessarily harder things, just harder for the model).

However there have been some bugs causing performance degradation acknowledged by Anthropic as well (and fixed) and so I would guess there's a good amount of real degradation still if people are still seeing issues.

I've seen a lot of people switching to codex cli, and yesterday I did too, for now my 200/mo goes to OpenAI. It's quite good and I recommend it.

rapind|5 months ago

What makes it particularly tricky to evaluate is that there could still be other bugs given how long these went without even acknowledgement until now, and they did state they are still looking into potential Opus issues.

I'll probably come back and try a Claude Code subscription again, but I'm good for the time being with the alternative I found. I also kind of suspect the subscription model isn't going to work for me long term and instead the pay per use approach (possibly with reserved time like we have for cloud compute) where I can swap models with low friction is far more appealing.

ncrtower|5 months ago

The same experience here: Claude with the pro plan over the summer was really doing a good job. The last 4 weeks? Constant slow-downs or API errors, more halucinating then before, and many mistakes. It appears to me that they are throttling to handle loads that they can't actually handle.

j45|5 months ago

Last 4 weeks have been awful, I have barely used my max in comparison to the month before and it's an active deterrent to use it because you don't know if it's going to work or hit an unpredictable limit before getting to the bottom of getting something working.

I don't feel Claude would do this intentionally, and am reminded how I kept Claude for use for some things but not generally.

syntaxing|5 months ago

I wonder if their API model is different from the subscription model. People called me crazy saying how GitHub copilot is better than Clause code but since I started using Claude code these past 3 weeks, times and times again, copilot + Claude sonnet 4 is better

sandos|5 months ago

Copilot did a giant leap imo, when Sonnet 4 arrived. BUT, I do have a lot of tempeorary problems where it just stops responding. Last week was awful, today though worked perfectly. I both vibe-coded a very wide (TUI, GUI, WEBUI, CLI, backend etc) python util for our specific product+environment and solved a bug in parallell using Sonnet 4 and GTP 4.1. I tried going to Sonnet when GPT fscked up, and its just hilarious. GPT can try sometimes to fix things 5 times in a row, Sonnet just directly fixes it. If only the enterprise quota was infinite.... :)

j45|5 months ago

API has always been a little different.

Might be worth trying Claude through Amazon as well.

typpilol|5 months ago

Agreed.. copilot is way better

armchairhacker|5 months ago

> "The model is getting worse" has been rumored so often, by now, shouldn't there be some trusted group(s) continually testing the models so we have evidence beyond anecdote?

https://news.ycombinator.com/item?id=45097263#45098202

nurettin|5 months ago

Here's some evidence

> Investigating - Last week, we opened an incident to investigate degraded quality in some Claude model responses. We found two separate issues that we’ve now resolved. We are continuing to monitor for any ongoing quality issues, including reports of degradation for Claude Opus 4.1.

https://status.anthropic.com/

SubiculumCode|5 months ago

Normally, I'd say yeah right, but I've been kind of feeling this too...and the thing is, we can't really know what they are running. It would be nice to have a private eval metric to monitor these things over time.

j45|5 months ago

I hit a limit this morning so fast and the quantization makes me think of different models.

Sonnet was nearly unusable without a perfect prompt and it took a separate therapy session with another sonnet chat to deconstruct how it was no lager working.

There appear to be hard overrides being introduced that overlook basic things like using your personal preferences.

Vague or general descriptions get weighed less important vs the strong and clear.

brunooliv|5 months ago

Agreed! It’s been horrible recently, feels like a completely different model under the hood. Before I could use it as a real sparring partner for architecture designs and decisions and I actually would learn in the process. Now it’s like it’s sycophancy is tuned to the max, it just agrees with me, does the bare minimum and produces code that doesn’t compile. For that I have the humans, ah!

footlose_3815|5 months ago

Yes. Last week sucked. We were thinking of sticking with them, but it seems they're shakier than I thought. With all that, and GPT5 just kicking Opus 4.1's butt in cost, reliability, and quality, I'm leaning OpenAI again.

Who knows how it will be next week.

bobbylarrybobby|5 months ago

Their iOS app could use some serious love. Not only does it have no offline capabilities (you can't even read your previous chats), if you're using the app and go offline, it puts up a big “connection lost; retry” alert and won't let you interact with the app until you get internet again. That means if you're mid prompt, you're blocked from editing further, and if you're reading a response, you have to wait until you get cell service again to continue reading.

It's one thing to not cache things for offline use, but it's quite another to intentionally unload items currently in use just because the internet connection dropped!

FloorEgg|5 months ago

Maybe the people who build features like these are not the same people who buy cards and build data centers?

Maybe the reliability problems have almost nothing to do with what features they build, and are bottlenecked for completely different reasons.

stpedgwdgfhgdd|5 months ago

I did not notice a degradation in quality last weeks. Not saying it is perfect, but the quality is similar (using Sonet) for the last month.

Using only 2 MCP servers and not extending claude.md.

hereme888|5 months ago

Would anyone agree with my experience that OpenAI has the most robust and reliable LLM ecosystem atm? One week I really like Gemini 2.5 pro, the next I thought Claude was better, a few days I thought Grok 4 was pretty good (grok 4 is the most inconsistent "model"). But at the end, I default to OpenAI for overall consistency and reliability.

djrj477dhsnv|5 months ago

For the last 6 months or so, Grok had been the most consistent for me, especially for anything that relies heavily on search.

AlecSchueler|5 months ago

I haven't actually noticed a marked decrease in intelligence but things like style, tone and sycophancy all suffer a lot recently

I knew it wasn't just me when it started using the phrase "chef's kiss" a few weeks ago.

This kind of behaviour is exactly why I avoided the competition and paid for Claude, but now I'm looking around.

trunnell|5 months ago

They need to focus on fixing reliability first.

Maybe. What would you rather have?

A) rock solid Sonnet 4 with Sonnet 5, say, next April

B) buggy Sonnet 4 with Sonnet 5, say, next January

Seems like different customers would have a range of preferences.

This must be one of the questions facing the team at Anthropic: what proportion of effort should go towards quality vs. velocity?

catlifeonmars|5 months ago

What does it mean to quantise a model?

stirfish|5 months ago

Basically you trade accuracy for space, so you use fewer resources

Rickasaurus|5 months ago

It means to change representation to less bits per number floating point, lower resolution numbers

BrawnyBadger53|5 months ago

Reducing the number of bits per float, it's like compression for models

mrcwinn|5 months ago

Agree. Even the web client itself is very buggy. I've almost completely stopped using anything Anthropic makes at this point. GPT-5 had a rocky start, but I think overall it's stellar, has the most features, and the client is very reliable for me.

esafak|5 months ago

I have not noticed a degradation in Claude, but I feel that with Gemini 2.5 Pro.

rapind|5 months ago

Apparently the "bugs" only affected some users... which in itself is kind of worrisome... I suspect the changes they made to limit abusers might have been misclassifying some "good" users. Like shadow throttling. This is just a suspicion based on possibly coincidental timing though.

the_sleaze_|5 months ago

I've not felt it with Claude. Gemini becomes slow and unresponsive at times. However Cursor routinely turns into a toddler banging on the keyboard. God forbid I press the tab key to move a line, lest Cursor deletes some CSS classes halfway down the file.

GabeIsko|5 months ago

That's not it! Direct engineering effort towards new features that will drive new customers and markets. Functionality is unimportant. Haven't you ever worked in enterprise software?

I'm kidding btw.

leptons|5 months ago

How much are you willing to pay for it? Maybe they just need a few billion more dollars to shovel into the furnace to keep the "AI" going faster.

super256|5 months ago

At least some transparency would be nice. It feels like they are serving less intelligent models labelled as more intelligent ones during peak times.

cloudhead|5 months ago

The web interface is also so laggy on Firefox I’ve started using other free offerings more despite paying for Claude..

swalsh|5 months ago

The people shipping these features are not the same people who are fixing reliability probably.

OtherShrezzing|5 months ago

No, but the salaries of the people shipping those fixtures could be spent on people who can fix the reliability problems.

DiabloD3|5 months ago

Anthropic needs to continue burning cash and goodwill in hopes they extend the runway to IPO.

They do not seem to care at all that what they're peddling is just elaborate smoke and mirrors.