(no title)
andreagrandi | 11 days ago
Opus has gone down the hill continously in the last week (and before you start flooding with replies, I've been testing opus/codex in parallel for the last week, I've plenty of examples of Claude going off track, then apologising, then saying "now it's all fixed!" and then only fixing part of it, when codex nailed at the first shot).
I can accept specific model limits, not an up/down in terms of reliability. And don't even let me get started on how bad Claude client has become. Others are finally catching up and gpt-5.3-codex is definitely better than opus-4.6
Everyone else (Codex CLI, Copilot CLI etc...) is going opensource, they are going closed. Others (OpenAI, Copilot etc...) explicitly allow using OpenCode, they explicitly forbid it.
This hostile behaviour is just the last drop.
super256|11 days ago
It seems like they currently have a lot of false positives: https://github.com/openai/codex/issues?q=High%20risk
andreagrandi|11 days ago
seu|11 days ago
Is a week the whole attention timespan of the late 2020s?
latexr|11 days ago
_kb|11 days ago
marcus_holmes|11 days ago
abm53|11 days ago
That pattern is people complaining that a particular model has degraded in quality of its responses over time or that it has been “nerfed” etc.
Although the models may evolve, and the tools calling them may change, I suspect a huge amount of this is simply confirmation bias.
ifwinterco|11 days ago
I'll give GPT 5.3 codex a real try I think
Esophagus4|11 days ago
But if people really like Codex better, maybe I’ll try it. I’ve been trying not to pay for 2 subscriptions at once but it might be worth a test.
mosselman|11 days ago
Opus 4.6 wrote me a working macos application.
Codex wrote me a html + css mockup of a macos application that didn't even look like a macos application at all.
Opus 4.5 was fine, but I feel that 4.6 is more often on the money on its implementations than 4.5 was. It is just slower.
kilroy123|11 days ago
trillic|10 days ago
choilive|10 days ago
GorbachevyChase|11 days ago
dannersy|11 days ago
The providers want to control what AI does to make money or dominate an industry so they don't have to make their money back right away. This was inevitable, I do not understand why we trust these companies, ever.
NamlchakKhandro|11 days ago
andreagrandi|11 days ago
First, we are not talking about a cheap service here. We are talking about a monthly subscription which costs 100 USD or 200 USD per month, depending on which plan you choose.
Second, it's like selling me a pizza and pretending I only eat it while sitting at your table. I want to eat the pizza at home. I'm not getting 2-3 more pizzas, I'm still getting the same pizza others are getting.
neya|11 days ago
resiros|11 days ago
unknown|11 days ago
[deleted]
thepasch|10 days ago
I have a feeling Anthropic might be in for an extremely rude awakening when that happens, and I don’t think it’s a matter of “if” anymore.
submain|10 days ago
The latest versions of claude code have been freezing and then crashing while waiting on long running commands. It's pretty frustrating.
WarmWash|10 days ago
Claude has gotten a lot of popular media attention in the last few weeks, and the influx of users is constraining compute/memory on an already compute heavy model. So you get all the suspected "tricks" like quantization, shorter thinking, KV cache optimizations.
It feels like the same thing that happened to Gemini 3, and what you can even feel throughout the day (the models seem smartest at 12am).
Dario in his interview with dwarkesh last week also lamented the same refrain that other lab leaders have: compute is constrained and there are big tradeoffs in how you allocate it. It feels safe to reason then that they will use any trick they can to free up compute.
cactusplant7374|11 days ago
kasey_junk|11 days ago
At least weekly I run a set of prompts to compare codex/claude against each other. This is quite easy the prompt sessions are just text files that are saved.
The problem is doing it enough for statistical significance and judging the output as better or not.
andreagrandi|11 days ago
SkyPuncher|11 days ago
A few things I've noticed:
* 4.6 doesn't look at certain files that it use to
* 4.6 tends to jump into writing code before it's fully understood the problem (annoying but promptable)
* 4.6 is less likely to do research, write to artifacts, or make external tool calls unless you specifically ask it to
* 4.6 is much more likely to ask annoying (blocking) questions that it can reasonably figure out on it's own
* 4.6 is much more likely to miss a critical detail in a planning document after being explicitly told to plan for that detail
* 4.6 needs to more proactively write its memories to file within a conversation to avoid going off track
* 4.6 is a lot worse about demonstrating critical details. I'm so tired of it explaining something conceptually without it thinking about how it implements details.
baq|11 days ago
bbstats|11 days ago
andreagrandi|11 days ago