(no title)
pmx
|
5 months ago
They need to focus on fixing reliability first. Their systems constantly go down and it appears they are having to quantise the models to keep up with demand, reducing intelligence significantly. New features like this feel pointless when the underlying model is becoming unusable.
mh-|5 months ago
(Speaking of both Claude Code and the desktop app, both Sonnet and Opus >=4, on the Max plan.)
data-ottawa|5 months ago
As an example I’ve been using an MCP tool to provide table schemas to Claude for months.
There was a point where it stopped recognizing the tool unless mentioned in early August. Maybe that’s related to their degraded quality issue.
This morning after pulling the correct schema info Sonnet started hallucinating columns (from Shopify’s API docs) and added them to my query.
That’s a use case I’ve been doing daily for months and in the last few weeks has gone from consistent low supervision to flaky and low quality.
I don’t know what’s going on, Sonnet has definitely felt worse, and the timeline matches their status page incident, but it’s definitely not resolved.
Opus 4.1 also feels flaky, it feels like it’s less consistent about recalling earlier prompt details than 4.0.
I personally am frustrated that there’s no refund or anything after a month of degraded performance, and they’ve had a lot of downtime.
pc86|5 months ago
trunnell|5 months ago
They recently resolved two bugs affecting model quality, one of which was in production Aug 5-Sep 4. They also wrote:
Sibling comments are claiming the opposite, attributing malice where the company itself says it was a screw up. Perhaps we should take Anthropic at its word, and also recognize that model performance will follow a probability distribution even for similar tasks, even without bugs making thing worse.pqdbr|5 months ago
OtomotO|5 months ago
I signed up for Claude over a week ago and I totally regret it!
Previously I was using it and some ChatGPT here and there (also had a subscription in the past) and I felt like Claude added some more value.
But it's getting so unstable. It generates code, I see it doing that, and then it throws the code away and gives me the previous version of something 1:1 as a new version.
And then I have to waste CO2 to tell it to please don't do that and then sometimes it generates what I want, sometimes it just generates it again, just to throw it away immediately...
This is soooooooo annoying and the reason I canceled my subscription!
yumraj|5 months ago
I've been running ccusage to monitor and my usage in $ terms has dropped to a 1/3 of what it was few weeks ago. While some of it could be due to how I'm using it, but a drop of 60%-70% cannot be attributed to that alone and I think is partly due to the performance.
To add: frequently, as in almost every time: 1) it'll start doing something and will go silent for a long time. 2) pressing esc to interrupt will take a long time to take action since it's probably stuck doing something. Earlier, interrupting via esc used to be almost instantaneous.
So, I still like it, but at my 1/3 drop in measured usage I'm almost tempted to go back to Pro and see if that'll meet my needs.
alvis|5 months ago
allisdust|5 months ago
gjvc|5 months ago
teknologist|5 months ago
bongodongobob|5 months ago
I picked up Claude at the beginning of the summer and have had the same experience.
fuomag9|5 months ago
probably_wrong|5 months ago
Based on the discussions here it seems that every model is either about to be great or was great in the past but now is not. Sucks for those of us who are stuck in the now, though.
darepublic|5 months ago
[deleted]
otabdeveloper4|5 months ago
yazanobeidi|5 months ago
mh-|5 months ago
j45|5 months ago
From troubleshooting Claude by reviewing it's performance and digging in multiple times why it did what it did, it seems useful to make sure the first sentence is a clearer and completer instruction instead of breaking it up.
As models optimize resources, prompt engineering seems to become relevant again.
paranoidrobot|5 months ago
I had to keep prompting it to generate new artifacts all the time.
Thankfuly that is mostly gone with Claude Code.
owenthejumper|5 months ago
srhngpr|5 months ago
ACCount37|5 months ago
https://status.anthropic.com/incidents/72f99lh1cj2c
That being said, they still have capacity issues on any day of the week that ends in Y. No clue how long would that take to resolve.
fragmede|5 months ago
mh-|5 months ago
> we never intentionally degrade model quality as a result of demand or other factors
Fully giving them the benefit of the doubt, I still think that still allows for a scenario like "we may [switch to quantized models|tune parameters], but our internal testing showed that these interventions didn't materially affect end user experience".
I hate to parse their words in this way, because I don't know how they could have phrased it that closed the door on this concern, but all the anecdata (personal and otherwise) suggests something is happening.
pmx|5 months ago
siva7|5 months ago
furyofantares|5 months ago
However there have been some bugs causing performance degradation acknowledged by Anthropic as well (and fixed) and so I would guess there's a good amount of real degradation still if people are still seeing issues.
I've seen a lot of people switching to codex cli, and yesterday I did too, for now my 200/mo goes to OpenAI. It's quite good and I recommend it.
rapind|5 months ago
I'll probably come back and try a Claude Code subscription again, but I'm good for the time being with the alternative I found. I also kind of suspect the subscription model isn't going to work for me long term and instead the pay per use approach (possibly with reserved time like we have for cloud compute) where I can swap models with low friction is far more appealing.
ncrtower|5 months ago
j45|5 months ago
I don't feel Claude would do this intentionally, and am reminded how I kept Claude for use for some things but not generally.
syntaxing|5 months ago
sandos|5 months ago
j45|5 months ago
Might be worth trying Claude through Amazon as well.
typpilol|5 months ago
FitchApps|5 months ago
https://www.businessinsider.com/anthropic-ceo-ai-90-percent-...
armchairhacker|5 months ago
https://news.ycombinator.com/item?id=45097263#45098202
nurettin|5 months ago
> Investigating - Last week, we opened an incident to investigate degraded quality in some Claude model responses. We found two separate issues that we’ve now resolved. We are continuing to monitor for any ongoing quality issues, including reports of degradation for Claude Opus 4.1.
https://status.anthropic.com/
SubiculumCode|5 months ago
j45|5 months ago
Sonnet was nearly unusable without a perfect prompt and it took a separate therapy session with another sonnet chat to deconstruct how it was no lager working.
There appear to be hard overrides being introduced that overlook basic things like using your personal preferences.
Vague or general descriptions get weighed less important vs the strong and clear.
brunooliv|5 months ago
footlose_3815|5 months ago
Who knows how it will be next week.
bobbylarrybobby|5 months ago
It's one thing to not cache things for offline use, but it's quite another to intentionally unload items currently in use just because the internet connection dropped!
FloorEgg|5 months ago
Maybe the reliability problems have almost nothing to do with what features they build, and are bottlenecked for completely different reasons.
stpedgwdgfhgdd|5 months ago
Using only 2 MCP servers and not extending claude.md.
hereme888|5 months ago
djrj477dhsnv|5 months ago
AlecSchueler|5 months ago
I knew it wasn't just me when it started using the phrase "chef's kiss" a few weeks ago.
This kind of behaviour is exactly why I avoided the competition and paid for Claude, but now I'm looking around.
trunnell|5 months ago
Maybe. What would you rather have?
A) rock solid Sonnet 4 with Sonnet 5, say, next April
B) buggy Sonnet 4 with Sonnet 5, say, next January
Seems like different customers would have a range of preferences.
This must be one of the questions facing the team at Anthropic: what proportion of effort should go towards quality vs. velocity?
catlifeonmars|5 months ago
stirfish|5 months ago
Rickasaurus|5 months ago
BrawnyBadger53|5 months ago
mrcwinn|5 months ago
esafak|5 months ago
rapind|5 months ago
the_sleaze_|5 months ago
GabeIsko|5 months ago
I'm kidding btw.
leptons|5 months ago
super256|5 months ago
cloudhead|5 months ago
swalsh|5 months ago
OtherShrezzing|5 months ago
DiabloD3|5 months ago
They do not seem to care at all that what they're peddling is just elaborate smoke and mirrors.