top | item 46904367

(no title)

Rperry2174 | 25 days ago

Whats interesting to me is that these gpt-5.3 and opus-4.6 are diverging philosophically and really in the same way that actual engineers and orgs have diverged philosophically

With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.

With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

that feels like a reflection of a real split in how people think llm-based coding should work...

some want tight human-in-the-loop control and others want to delegate whole chunks of work and review the result

Interested to see if we eventually see models optimize for those two philosophies and 3rd, 4th, 5th philosophies that will emerge in the coming years.

Maybe it will be less about benchmarks and more about different ideas of what working-with-ai means

discuss

order

karmasimida|25 days ago

> With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.

> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

Ain't the UX is the exact opposite? Codex thinks much longer before gives you back the answer.

xd1936|25 days ago

I've also had the exact opposite experience with tone. Claude Code wants to build with me, and Codex wants to go off on its own for a while before returning with opinions.

WilcoKruijer|25 days ago

Yes, you’re right for 4.5 and 5.2. Hence they’re focusing on improving the opposite thing and thus are actually converging.

cwyers|24 days ago

Codex now lets you tell the LLM tgings in the middle of its thinking without interrupting it, so you can read the thinking traces and tell it to change course if it's going off track.

bt1a|25 days ago

This is most likely an inference serving problem in terms of capacity and latency given that Opus X and the latest GPT models available in the API have always responded quickly and slowly, respectively

ghosty141|25 days ago

I'm personally 100% convinced (assuming prices stay reasonable) that the Codex approach is here to stay.

Having a human in the loop eliminates all the problems that LLMs have and continously reviewing small'ish chunks of code works really well from my experience.

It saves so much time having Codex do all the plumbing so you can focus on the actual "core" part of a feature.

LLMs still (and I doubt that changes) can't think and generalize. If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to. This makes it kinda pointless for the "full autonomy" approach since effecitly code quality and abstractions completely go down the drain over time. That's fine if it's just prototyping or "throwaway" scripts but for bigger codebases where longevity matters it's a dealbreaker.

_zoltan_|25 days ago

I'm personally 100% convinced of the opposite, that it's a waste of time to steer them. we know now that agentic loops can converge given the proper framing and self-reflectiveness tools.

xXSLAYERXx|24 days ago

I've been using codex for one week and I have been the most productive I have ever been. Small prs, tight rules, I get almost exactly what I want. Things tend to go sideways when scope creeps into my request. But I just close the PR instead of fighting with the agent. In one week: 28 prs, 26 merged. Absolutely unreal.

vidarh|24 days ago

I will personally never consider using an agent that can't be easily pushed toward working on its own for long periods (hours) at a time. It's a total waste of time for me to babysit the LLM.

NuclearPM|24 days ago

> If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to

That could easily be automated.

Skidaddle|24 days ago

But tokens are way cheaper than human labor

sejje|24 days ago

Aider was doing this a long time ago

utilize1808|25 days ago

I think it's the opposite. Especially considering Codex started out as a web app that offers very little interactivity: you are supposed to drop a request and let it run automatously in a containerized environment; you can then follow up on it via chat --- no interactive code editing.

Rperry2174|25 days ago

Fair I agree that was true of early codex and my perception too.. but today there are two announcements that came out and thats what im referring to.

specifically, the GPT-5.3 post explicitly leans into "interactive collaborator" langauge and steering mid execution

OpenAI post: "Much like a colleague, you can steer and interact with GPT-5.3-Codex while it’s working, without losing context."

OpenAI post: "Instead of waiting for a final output, you can interact in real time—ask questions, discuss approaches, and steer toward the solution"

Claude post: "Claude Opus 4.6 is designed for longer-running, agentic work — planning complex tasks more carefully and executing them with less back-and-forth from the user."

mcintyre1994|25 days ago

This kind of sounds like both of them stepping into the other’s turf, to simplify a bit.

I haven’t used Codex but use Claude Code, and the way people (before today) described Codex to me was like how you’re describing Opus 4.6

So it sounds like they’re converging toward “both these approaches are useful at different times” potentially? And neither want people who prefer one way of working to be locked to the other’s model.

giancarlostoro|25 days ago

> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

This feels wrong, I can't comment on Codex, but Claude will prompt you and ask you before changing files, even when I run it in dangerous mode on Zed, I can still review all the diffs and undo them, or you know, tell it what to change. If you're worried about it making too many decisions, you can pre-prompt Claude Code (via .claude/instructions.md) and instruct it to always ask follow up questions regarding architectural decisions.

Sometimes I go out of my way to tell Claude DO NOT ASK ME FOR FOLLOW UPS JUST DO THE THING.

Rperry2174|25 days ago

yeah I'm mostly just talking about how they're framing it: "Claude Opus 4.6 is designed for longer-running, agentic work — planning complex tasks more carefully and executing them with less back-and-forth from the user"

I guess its also quite interesting that how they are framing these projects are opposite from how people currently perceive them and I guess that may be a conscious choice...

jhancock|24 days ago

Good breakdown.

I usually want the codex approach for code/product "shaping" iteratively with the ai.

Once things are shaped and common "scaling patterns" are well established, then for things like adding a front end (which is constantly changing, more views) then letting the autonomous approach run wild can *sometimes* be useful.

I have found that codex is better at remembering when I ask to not get carried away...whereas claude requires constant reminders.

techbro_1a|25 days ago

> With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.

This is true, but I find that Codex thinks more than Opus. That's why 5.2 Codex was more reliable than Opus 4.5

dimgl|24 days ago

Did you get those backwards? Codex, Gemini, etc. all wait until the requests are done to accept user feedback. Claude Code allows you to insert messages in between turns.

aurareturn|24 days ago

Codex added an experimental feature to allow steering mid task.

bob1029|25 days ago

I think there is another philosophy where the agent is domain specific. Not that we have to invent an entirely new universe for every product or business, but that there is a small amount of semi-customization involved to achieve an ideal agent.

I would much rather work with things like the Chat Completion API than any frameworks that compose over it. I want total control over how tool calling and error handling works. I've got concerns specific to my business/product/customer that couldn't possibly have been considered as part of these frameworks.

Whether or not a human needs to be tightly looped in could vary wildly depending on the specific part of the business you are dealing with. Having a purpose-built agent that understands where additional verification needs to occur (and not occur) can give you the best of both worlds.

aulin|24 days ago

Admit I didn't follow the announcements but isn't that a matter of UI? Doesn't seem something that should be baked in the model but in the tooling around it and the instructions you give them. E.g. I've been playing with with GitHub copilot CLI (that despite the bad fame is absolutely amazing) and the same model completely changes its behavior with the prompt. You can have it answer a question promptly or send it on a multi-hour multi-agent exploration writing detailed specs with a single prompt. Or you can have it stop midway for clarification. It all depends on the instructions. Also this is particularly interesting with GitHub billing model as each prompt counts 1 request no matter how many tokens it burns.

F7F7F7|24 days ago

It depends honestly. Both are prone to doing the exact opposite of what you asked. Especially with poor context management.

I’ve had both $200 plans and now just have Max x20 and use the $20 ChatGPT plan for an inferior Codex.

My experience (up until today) has always been that Codex acts like that one Sr Engineer that we all know. They are kind of a dick. And will disappear into a dark hole and emerge with a circle when you asked for a pentagon. Then let you know why edges are bad for you.

And yes, Anthropic is pivoting hard into everything agentic. I bet it’s not too long before Claude Code stops differentiating models. I had Opus blow 750k tokens on a single small task.

cchance|25 days ago

Just because you can inject steering doesn't mean they stered away from long running...

Theres hundreds of people who upload Codex 5.2 running for hours unattended and coming back with full commits

mdale|24 days ago

I think it's just both companies building/ marketing to the strength of their competitor. As general perception has been the opposite for codex and Opus respectfully.

hbarka|25 days ago

How can they be diverging, LLMs are built on similar foundations aka the Transformer architecture. Do you mean the training method (RLHF) is diverging?

iranintoavan|25 days ago

I'm not OP but I suspect they are meaning the products / tooling / company direction, not necessarily the underlying LLM architecture.

sfmike|24 days ago

It's the opposite? codex course corrects and is self inquisitive. opus is just wrong and need to refeed it it's wrong.

dboon|24 days ago

…what? It is quite literally the opposite. This isn’t a matter of taste or perception.

blurbleblurble|25 days ago

Funny cause the situation was totally flipped last iteration.

mi_lk|23 days ago

It’s the opposite way

rozumbrada|25 days ago

I read this exact comment with I would say completely the same words several times in X and I would bet my money it's LLM generated by someone who has not even tried both the tools. This AI slop even in the site like this without direct monetisation implications from fake engagement is making me sick...

drsalt|24 days ago

be rich, hire an ai guy, let him deal with it

d--b|25 days ago

I am definitely using Opus as an interactive collaborator that I steer mid-execution, stay in the loop and course correct as it works.

I mean Opus asks a lot if he should run things, and each time you can tell it to change. And if that's not enough you can always press esc to interrupt.