We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI. I'm doing massive lifts with it on software that would never before have been feasible for me personally, or any team I've ever run. I'll use Claude Code maybe once every two weeks as a second set of eyes to inspect code and document a bug, with mixed success. But my experience has been that initially Claude Code was amazing and a "just take my frikkin money" product. Then Codex overtook CC and is much better at longer runs on hard problems. I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf. Whereas Codex's ability to profoundly increase the capabilities of a software org is a secret that's slowly getting out.
I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.
Also Codex CLI is cheaper than Claude Code.
I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.
Feels like with every announcement there’s the same comment: “this LLM tool I’m using now is the real deal, the thing I was using previously and spending stupid amounts of money on looked good but failed at XYZ, this new thing is where it’s at”. Rinse and repeat.
Which means it wasn’t true any of the previous times, so why would it be true this time? It feels like an endless loop of the “friendship ended” meme with AI companies.
It’s much more likely commenters are still in the honeymoon hype phase and (again) haven’t found the problems because they’re hyper focused on what the new thing is good at that the previous one wasn’t, ignoring the other flaws. I see that a lot with human relationships as well, where people latch on to new partners because they obviously don’t have the big problem that was a strain on the previous relationship. But eventually something else arises. Rinse and repeat.
I find Codex CLI to be very good too, but it’s missing tons of features that I use in Claude Code daily that keep me from switching full time.
- Good bash command permission system
- Rollbacks coupled with conversation and code
- Easy switching between approval modes (Claude had a keybind that makes this easy)
- Ability to send messages while it’s working (Codex just queues them up for after it’s done, Claude injects them into the current task)
- Codex is very frustrating when I have to keep allowing it to run the same commands over and over, Claude this works well when I approve it to run a command for the session
- Agents (these are very useful for controlling context)
- A real plan mode (crucial)
- Skills (these are basically just lazy loaded context and are amazing)
- The sandboxing in codex is so confusing, commands fail all the time because they try to log to some system directory or use internet access which is blocked by default and hard to figure out
- Codex prefers python snippets to bash commands which is very hard to permission and audit
When Codex gets to feature parity, I’ll seriously look at switching, but until then it’s just a really good model wrapped in an okay harness
I totally agree. I remember the June magic as well - almost overnight my abilities and throughput were profoundly increased, I had many weeks of late nights in awe and wonder trying things that were beyond my ability to implement technically but within the bounds of my conceptual understanding.
Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.
Im not saying this is a paid endorsement but the internet is dead and I wonder what openAI would pay, if they could, to get such a glowing review as top comment on HN
I agree with this and actually Claude Code agrees with it too. I've had Codex cli (gpt-5-codex high) and claude code 4.5 sonnet (and sometimes opus 4.1) do the same lengthier task with the same prompt in cloned folders about 10x now and then I ask them to review the work in the other folder and determine who did the best job.
100% of the time Codex has done a far better job according to both Codex and Claude Code when reviewing. Meeting all the requirements where Claude would leave things out, do them lazily or badly and lose track overall.
Codex high just feels much smarter and more capable than Claude currently and even though it's quite a bit slower, it's work that I don't have to go over again and again to get it to the standards I want.
Yeah this has been my experience as well. The Claude Code UI is still so much better, and the permissioning policy system is much better. Though I'm working on closing that gap by writing a custom policy https://github.com/openai/codex/blob/main/codex-rs/execpolic...
Kinda sick of Codex asking for approval to run tests for each test instance
Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.
Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.
Yeah, after correcting it several times I've gotten Claude Code to tell me it didn't have the expertise to work in one of my problem domains. It was kinda surprising but also kinda refreshing that it knew when to give up. For better or worse I haven't noticed similar things with Codex.
I did the opposite I switched to Claude code once the released the new model last week of the one before, I tried using codex, but there was issues with the terminal and prompting (multiple characters getting deleted) I found Claude code to have more features and less bugs, like the edit on vim for the prompt being really useful and find it better to iterate. Also I like more its tool usage and the use of the shell. Sometimes codex prefer to use python instead of doing the equivalent shell command. Maybe it's like the other people say here, that codex it's better for long running tasks, I prefer to give Claude small tasks and I'm usually satisfied with the result and I like to work alongside the agent
This is such an interesting perspective because I feel codex is hugely impressive but falls apart on any even remotely difficult task and is too autonomous and not eager enough.
Claude feels like a better fit for an experienced engineer. He's a positive, eager little fellow.
My experience is that if I know what I want, CC will produce better code, given I specify it correctly. The planning mode is great for this too, as we can "brainstorm" and what I have seen help a lot is if I ask questions about why it did a certain way. Often it'll figure out on its own why that's wrong, but sometimes it requires a bit of course correction.
On the other hand, last time I tried GPT-5 from Cursor, it was so disappointing. It kept getting confused while we were iterating on a plan, and I had to explain to it multiple times that it's thinking about the problem the same way. After a while I gave up, opened a new chat and gave it my own summary of the conversation (with the wrong parts removed) and then it worked fine. Maybe my initial prompt was vague, but it continually seemed to forget course corrections in that chat.
I mostly tend to use them more to save me from typing, rather than asking it to design things. Occasionally we do a more open ended discussion, but those have great variance. It seems to do better with such discussions online than within the coding tool (I've bounced maths/implementation ideas off of while writing shaders on a personal project)
When you say Claude Code, what model do you refer to? CC with Opus still outperforms Codex (gpt-5-codex) for me for anything I do (Rust, computer graphics-related).
However, Anthropic restricted Opus use for Max plan users 10 days or so ago severly (12-fold from 40h/week down to 5h week) [1].
Sonnet is a vastly inferioir model for my use cases (but still frequently writes better Rust code than Codex). So now I use Codex for planning and Sonnet for writing the code. However, I usually need about 3--5 loops with Codex reviewing, Sonnet fixing, rinse & repeat.
Before I could use one-shot Opus and review myself directly, and do one polish run following my review (also via Opus). That was possible from June--mid October but no more.
On the topic of comparing OpenAI models with Anthropocene models, I have a hybrid approach that seems really nice.
I set up an MCP tool to use gpt-5 with high reasoning with Claude Code (like tools with "personas" like architect, security reviewer, etc), and I feel that it SIGNIFICANTLY amplifies the performance of Claude alone. I don't see other people using LLMs as tools in these environments, and it's making me wonder if I'm either missing something or somehow ahead of the curve.
Basically instead of "do x (with details)" I say "ask the architect tool for how you should implement X" and it gets into this back and forth that's more productive because it's forcing some "introspection" on the plan.
> We were heavy users of Claude Code ($70K+ spend per year)
Claude code has only been generally available since May last year (a year and half ago)... I'm surprised by the process that you are implying; within a year and a half, you both spent 70k on claude code, and knew enough about it and its competition to switch away from it? I dont think I'd be able to due diligence even if LLM evaluation was my fulltime job. Let alone the fact that the capabilities of each provider are changing dramatically every few weeks.
Claude Code is still good but I don’t TRUST it. With Claude Code and Sonnet I’m expecting failure. I can get things done but there’s an administrative overhead of futzing around with markdown files, defensive commit hooks and unit tests to keep it on rails while managing the context panic. Codex CLI with gpt-5-codex high reasoning is next gen. I’m sure Sonnet 5 will match it soon. At that point I think a lot of the workflows people use in Claude Code will be obsolete and the sycophancy will disappear.
In agreement. Large caveats that can explain differing opinions (that I've experienced) are:
* Is really only magic on Linux or WSL. Mediocre on Windows
* Is quite mediocre at UI code but exceptional at backend, engineering, ops, etc. (I use Claude to spruce up everything user facing -- Codex _can_ mirror designs already in place fairly well).
* Exceptional at certain languages, OK at others.
* GPT-5 and GPT-5-Codex are not the same. Both are models used by the Codex CLI and the GPT-5-Codex model is recent and fantastically good.
* Codex CLI is not "conversational" in the way that Claude is. You kind of interact with it differently.
I often wonder about the impact of different prompting styles. I think the WOW moment for me is that I am no longer returning to code to find tangled messes, duplicate silo'd versions of the same solution (in a different project in the same codebase), or strangely novice style coding and error handling.
As a developer for 20yrs+, using Codex running the GPT-5-Codex model has felt like working with a peer or near-peer for the first time ever. I've been able to move beyond smaller efforts and also make quite a lot of progress that didn't have to be undone/redone. I've used it for a solid month making phenomenal progress and able to offload as-if I had another developer.
Honestly, my biggest concern is that OpenAI is teasing this capable model and then pulls the rug in a month with an "update".
As for the topic at hand, I think Claude Code has without a doubt the best "harness" and interface. It's faster, painless, and has a very clean and readable way of laying out findings when troubleshooting. If there were a cheap and usable version of Opus... perhaps that would keep Claude Code on the cutting edge.
> I've seen Claude Code literally just give up on a hard problem and tell me to buy something off the shelf
I've been seeing more of this lately despite initial excellent results. Not sure what's going on, but the value is certainly dropping for me. I'll have to check out codex. CLI integration is critical for me at this point. For me it is the only thing that actually helps realize the benefits of LLM models we have today. My last NixOS install was completely managed by Claude Code and it worked very well. This was the result of my latest frustrations:
Though I know the statement it made isn't "true". I've had much better luck pursuing other implementation paths with CC in the same space. I could have prompted around this and should have reset the context much earlier but I was drunk "coding" at that point and drove it into a corner.
I haven’t used Codex a lot, but GPT-5 is just a bit smarter in agent mode than Claude 4.5. The most challenging thing I’ve used it for is for code review and GPT-5 somewhat regularly found intricate bugs that Claude missed. However, Claude seemed to be better at following directions exactly vs GPT-5 which requires a lot more precision.
This was my experience too until a couple weeks ago, when Codex suddenly got dumbed down.
Initially, I had great success with codex medium- I could refactor with confidence, code generally ran on the first or second try, etc.
Then when that suddenly dumbed down to Claude Sonnet 3.5 quality I moved to GPT5 High to get back what had been lost. That was okay for a few days. Now GPT5 High has dropped to Claude Sonnet 3.5 quality.
Costs are 6x cheaper and it's way faster and good at test writing and tool calling. It some times can be a bit messy though so use Gemini or Claude or codex for that hard problems....
Similar feeling. Seems it is good at certain things and if something doesnt work it want to do things simply and in turn becomes something that you didnt ask for and certain times opposite of what you wanted. On the other hand with codex certain time you feel the AGI but that is like 2 out of 10 sessions. This is primarily may be due to how complete the prompt and how well you define the problems.
Totally agree. I was just thinking that I wouldn't want this feature for Claude Code but for Codex right now it would be great! I can simply let tasks run in Codex and I know it's going to eventually do what I want. Where as with Claude Code I feel like I have to watch it like a hawk and interrupt it when it goes off the rails.
My experience is similar, but for me, Claude Code is still better when designing or developing a frontend page from scratch. I have seen that Codex follows instructions a bit too literally, and the result can feel a little cold.
CC on the other hand feels more creative and has mostly given better UI.
Of course, once the page is ready, I switch to Codex to build further.
Does no one use Blocks Goose CLI anymore? I went to a hackathon in SF at the beginning of the year and it seemed like 90% of the groups used Goose to do something in their Agent project. I get that the CLI agent scene has exploded since then I just wonder what what is so much better in the competition?
It's really solid. It's effectively a web (and native mobile) UI over Claude Code CLI, more specifically "claude --dangerously-skip-permissions".
Anthropic have recognized that Claude Code where you don't have to approve every step is massively more productive and interesting than the default, so it's worth investing a lot of resources in sandboxing.
The most interesting parts of this to me are somewhat buried:
- Claude Code has been added to iOS
- Claude Code on the Web allows for seamless switching to Claude Code CLI
- They have open sourced an OS-native sandboxing system which limits file system and network access _without_ needing containers
However, I find the emphasis on limiting the outbound network access somewhat puzzling because the allowlists invariably include domains like gist.github.com and dozens of others which act effectively as public CMS’es and would still permit exfiltration with just a bit of extra effort.
I feel like these background agents still aren't doing what I want from a developer experience perspective. Running in an inaccessible environment that pushes random things to branches that I then have to checkout locally doesn't feel great.
AI coding should be tightly in the inner dev loop! PRs are a bad way to review and iterate on code. They are a last line of defense, not the primary way to develop.
Give me an isolated environment that is one click hooked up to Cursor/VSCode Remote SSH. It should be the default. I can't think of a single time that Claude or any other AI tool nailed the request on the first try (other than trivial things). I always need to touch it up or at least navigate around and validate it in my IDE.
> We were heavy users of Claude Code ($70K+ spend per year) and have almost completely switched to codex CLI
Seeing comments like this all over the place. I switched to CC from Cursor in June / July because I saw the same types of comments. I switched from VSCode + Copilot about 8 months before that for the same reason. I remember being skeptical that this sort of thing was guerilla marketing, but CC was in fact better than Cursor. Guess I'll try Codex, and I guess that it's good that there are multiple competing products making big strides.
Never would have imagined myself ditching IDEs and workflows 3x in a few months. A little exhausting
No relations to them, but I've started using Happy[0]'s iOS app to start and continue Claude Code sessions on my iPhone. It allows me to run sessions on a custom environment, like a machine with a GPU to train models
I was just working on something similar for OpenCode - pushing it now in case it's useful for someone[0].
It can run in a front-end only mode (I'll put up a hosted version soon), and then you need to specify your OpenCode API server and it'll connect to it. Alternatively, it can spin up the API server itself and proxy it, and then you just need to expose (securely) the server to the internet.
The UI is responsive and my main idea was that I can easily continue directing the AI from my phone, but it's also of course possible to just spin up new sessions. So often I have an idea while I'm away from my keyboard, and being up able to just say "create an X" and let it do its thing while I'm on the go is quite exciting.
It doesn't spin up a special sandbox environment or anything like that, but you're really free to run it inside whatever sandboxing solution you want. And unlike Claude Code, you're of course free to choose whatever model you want.
I've been using Happy Coder[0] for some time now on web and mobile. I run it `--yolo` mode on an isolated VM across multiple projects.
With Happy, I managed to turn one of these Claude Code instances into a replacement for Claude that has all the MCP goodness I could ever want and more.
This is going to be extremely useful. A lot of people have hacked together similar things to get around waiting for CC to finish without mangling worktrees and branches manually.
I was curious how the 'Open in CLI' works - it copies a command to clipboard like 'claude --teleport session_XXXXX', which opens the same chat in the CLI, and checks out a new branch off origin/main which it's created for the thread, called 'claude/feature-name-XXXXX'.
I prefer not to use CC at the 'PR level' because it still needs too much hand-holding, so very happy to see that they've added this.
Update: Session titles are either being leaked between users or have a very bad LLM writing them. I'm seeing "Update Ton Blockchain Configuration" and "Retrieve Current PIN Code" for a project that has nothing to do with blockchain or PIN codes...
Been using both daily for three months. Different tools for different jobs.
Claude Code has better UX. Period. The permission system, rollbacks, plan mode - it's more polished. Iterative work feels natural. Quick fixes, exploratory coding, when I'm not sure exactly what I want yet - Claude wins.
Codex is more reliable when stakes are high. Hard problems. Multi-file refactors. Complex business logic. The model just grinds through it. Less hand-holding needed.
Here's the split I've landed on - Claude for fast iteration tasks where I'm actively involved. Codex for delegate-and-walk-away work that needs to be right first time.
Not about which is "better" - wrong question. It's about tooling vs model capability. Claude optimized the wrapper. OpenAI optimized the engine.
Personally, my one annoyance here is that it requires you to install a GitHub App that gives it direct write permissions to all code in your repos (in addition to issues, PRs, etc).
I'd much rather give it read permissions, have it work in its own clone, and then manually pull changes back through (either with a web review UI somehow, or just pulling the changes locally). Partly for security, partly just to provide a good review gate.
Would also allow using this with other people's repos, where I _can't_ give write permissions, which would be super helpful for exploring dependency repos, or doing more general research. I've found this super helpful with Claude Code locally but seems impossible on the web right now.
Pair programming is still one of the best ways to knowledge transfer between two programmers in a high throughput manner. Humans learn by doing, building synaptic connections.
I wonder if a shared Claude Code instance has the same effect?
So is this their version of Jules / Codex / Copilot agent? Aka autonomous agent in the cloud you give a task and it spits out a PR a bit later?
It’s interesting how all the LLMs slowly end up with the same feature set and picking one really ends up with personal preference.
Me as a dev am happy that I now have 4 autonomous engineers that I can delegate stuff to depending on task difficulty and rate limits. Even just Copilot + Codex has made me a lot more productive
Also rip to all the startups that tried to provide “Claude in the cloud”, though this was very predictable to happen
Very curious to see what usage limits are like for paid plans. Anthropic was already experiencing issues with high-volume model usage for Pro and Max users. I hope their infrastructure is able to adequately support running these additional coding environments on top of model inference.
Just to be clear, I'm excited for the capability to use Claude Code entirely within the browser. However, I've heard reports of Max users experiencing throttled usage limits in recent months, and am concerned as to whether this will exacerbate that issue or not.
It's interesting how most of these tools are (exclusively) Github.
We're on Gitlab for historic reasons. Where Github now has numerous opporuntities to use AI as part of your workflow, there's nothing in Gitlab (from what I can tell), unless you're paying big bucks.
I like using AI to boost my productivity. I'm surprised that that'll be the thing that makes me migrate to Github.
Nit about doing your AI interfaces on the Web: I really want claude.ai and chatgpt.com to offer a standard username+password login without 2FA. The kind my privacy-friendly browser of short-lived sessions can complete in a couple clicks, like for most other SaaSes, and then I'm in and using the tool.
I don't want to leak data either way by using some "let's throw SSO from a sketchy adtech company into the trust loop".
I don't want to wait a minute for Anthropic's login-by-email link, and have the process slam the brakes on my workflow and train of thought.
I don't want to wait a minute for OpenAI's MFA-by-email code (even though I disabled that in the account settings, it still did it).
I don't want to deal with desktop clients I don't trust, or that might not keep up with feature improvements. Nor have to kludge up a clumsy virtualization sandbox for an untrusted client, just to ask an LLM questions that could just be in a Web browser.
I got so used to having Claude Code read some of my MCP tools, and was bummed to see that it couldn't connect to them yet on the web.
Pretty cool though! Will need to use it for some more isolated work/code edits. Claude Code is now my workhorse for a ton of stuff including non-coding work (esp. with the right MCPs)
We’re moving almost entirely to Codex, first because often it’s just better, and second because it’s much cheaper. It’s a bet that they’re better now, but given capacity and funding, they’ll be better later too.
The only edge Claude has is context window, which we do sometimes hit, but I’m sure that gap will close.
[+] [-] mmaunder|4 months ago|reply
I don't have any relationship with any AI company, and honestly I was rooting for Anthropic, but Codex CLI is just way way better.
Also Codex CLI is cheaper than Claude Code.
I think Anthropic are going to have to somehow leapfrog OpenAI to regain the position they were in around June of this year. But right now they're being handed their hat.
[+] [-] latexr|4 months ago|reply
Which means it wasn’t true any of the previous times, so why would it be true this time? It feels like an endless loop of the “friendship ended” meme with AI companies.
https://knowyourmeme.com/editorials/guides/what-is-the-frien...
It’s much more likely commenters are still in the honeymoon hype phase and (again) haven’t found the problems because they’re hyper focused on what the new thing is good at that the previous one wasn’t, ignoring the other flaws. I see that a lot with human relationships as well, where people latch on to new partners because they obviously don’t have the big problem that was a strain on the previous relationship. But eventually something else arises. Rinse and repeat.
[+] [-] jswny|4 months ago|reply
- Good bash command permission system
- Rollbacks coupled with conversation and code
- Easy switching between approval modes (Claude had a keybind that makes this easy)
- Ability to send messages while it’s working (Codex just queues them up for after it’s done, Claude injects them into the current task)
- Codex is very frustrating when I have to keep allowing it to run the same commands over and over, Claude this works well when I approve it to run a command for the session
- Agents (these are very useful for controlling context)
- A real plan mode (crucial)
- Skills (these are basically just lazy loaded context and are amazing)
- The sandboxing in codex is so confusing, commands fail all the time because they try to log to some system directory or use internet access which is blocked by default and hard to figure out
- Codex prefers python snippets to bash commands which is very hard to permission and audit
When Codex gets to feature parity, I’ll seriously look at switching, but until then it’s just a really good model wrapped in an okay harness
[+] [-] pkreg01|4 months ago|reply
Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.
[+] [-] bad_haircut72|4 months ago|reply
[+] [-] WXLCKNO|4 months ago|reply
100% of the time Codex has done a far better job according to both Codex and Claude Code when reviewing. Meeting all the requirements where Claude would leave things out, do them lazily or badly and lose track overall.
Codex high just feels much smarter and more capable than Claude currently and even though it's quite a bit slower, it's work that I don't have to go over again and again to get it to the standards I want.
[+] [-] swah|4 months ago|reply
Its very odd because I was hoping they were very on par.
[+] [-] cesarvarela|4 months ago|reply
[+] [-] maherbeg|4 months ago|reply
Kinda sick of Codex asking for approval to run tests for each test instance
[+] [-] lherron|4 months ago|reply
Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.
[+] [-] bcrosby95|4 months ago|reply
[+] [-] kelvinjps10|4 months ago|reply
[+] [-] catigula|4 months ago|reply
Claude feels like a better fit for an experienced engineer. He's a positive, eager little fellow.
[+] [-] hn_saver|4 months ago|reply
[+] [-] spoiler|4 months ago|reply
On the other hand, last time I tried GPT-5 from Cursor, it was so disappointing. It kept getting confused while we were iterating on a plan, and I had to explain to it multiple times that it's thinking about the problem the same way. After a while I gave up, opened a new chat and gave it my own summary of the conversation (with the wrong parts removed) and then it worked fine. Maybe my initial prompt was vague, but it continually seemed to forget course corrections in that chat.
I mostly tend to use them more to save me from typing, rather than asking it to design things. Occasionally we do a more open ended discussion, but those have great variance. It seems to do better with such discussions online than within the coding tool (I've bounced maths/implementation ideas off of while writing shaders on a personal project)
[+] [-] virtualritz|4 months ago|reply
However, Anthropic restricted Opus use for Max plan users 10 days or so ago severly (12-fold from 40h/week down to 5h week) [1].
Sonnet is a vastly inferioir model for my use cases (but still frequently writes better Rust code than Codex). So now I use Codex for planning and Sonnet for writing the code. However, I usually need about 3--5 loops with Codex reviewing, Sonnet fixing, rinse & repeat.
Before I could use one-shot Opus and review myself directly, and do one polish run following my review (also via Opus). That was possible from June--mid October but no more.
[1] https://github.com/anthropics/claude-code/issues/8449
[+] [-] p337|4 months ago|reply
I set up an MCP tool to use gpt-5 with high reasoning with Claude Code (like tools with "personas" like architect, security reviewer, etc), and I feel that it SIGNIFICANTLY amplifies the performance of Claude alone. I don't see other people using LLMs as tools in these environments, and it's making me wonder if I'm either missing something or somehow ahead of the curve.
Basically instead of "do x (with details)" I say "ask the architect tool for how you should implement X" and it gets into this back and forth that's more productive because it's forcing some "introspection" on the plan.
[+] [-] Rebuff5007|4 months ago|reply
Claude code has only been generally available since May last year (a year and half ago)... I'm surprised by the process that you are implying; within a year and a half, you both spent 70k on claude code, and knew enough about it and its competition to switch away from it? I dont think I'd be able to due diligence even if LLM evaluation was my fulltime job. Let alone the fact that the capabilities of each provider are changing dramatically every few weeks.
[+] [-] CompoundEyes|4 months ago|reply
[+] [-] didibus|4 months ago|reply
Fails to escalate permissions, gets derailed, loves changing too many things everywhere.
GPT5 is good, but codex is not.
[+] [-] dudeinhawaii|4 months ago|reply
* Is really only magic on Linux or WSL. Mediocre on Windows
* Is quite mediocre at UI code but exceptional at backend, engineering, ops, etc. (I use Claude to spruce up everything user facing -- Codex _can_ mirror designs already in place fairly well).
* Exceptional at certain languages, OK at others.
* GPT-5 and GPT-5-Codex are not the same. Both are models used by the Codex CLI and the GPT-5-Codex model is recent and fantastically good.
* Codex CLI is not "conversational" in the way that Claude is. You kind of interact with it differently.
I often wonder about the impact of different prompting styles. I think the WOW moment for me is that I am no longer returning to code to find tangled messes, duplicate silo'd versions of the same solution (in a different project in the same codebase), or strangely novice style coding and error handling.
As a developer for 20yrs+, using Codex running the GPT-5-Codex model has felt like working with a peer or near-peer for the first time ever. I've been able to move beyond smaller efforts and also make quite a lot of progress that didn't have to be undone/redone. I've used it for a solid month making phenomenal progress and able to offload as-if I had another developer.
Honestly, my biggest concern is that OpenAI is teasing this capable model and then pulls the rug in a month with an "update".
As for the topic at hand, I think Claude Code has without a doubt the best "harness" and interface. It's faster, painless, and has a very clean and readable way of laying out findings when troubleshooting. If there were a cheap and usable version of Opus... perhaps that would keep Claude Code on the cutting edge.
[+] [-] tstrimple|4 months ago|reply
I've been seeing more of this lately despite initial excellent results. Not sure what's going on, but the value is certainly dropping for me. I'll have to check out codex. CLI integration is critical for me at this point. For me it is the only thing that actually helps realize the benefits of LLM models we have today. My last NixOS install was completely managed by Claude Code and it worked very well. This was the result of my latest frustrations:
https://i.imgur.com/C4nykhA.png
Though I know the statement it made isn't "true". I've had much better luck pursuing other implementation paths with CC in the same space. I could have prompted around this and should have reset the context much earlier but I was drunk "coding" at that point and drove it into a corner.
[+] [-] slaymaker1907|4 months ago|reply
[+] [-] mi_lk|4 months ago|reply
[+] [-] unsupp0rted|4 months ago|reply
Initially, I had great success with codex medium- I could refactor with confidence, code generally ran on the first or second try, etc.
Then when that suddenly dumbed down to Claude Sonnet 3.5 quality I moved to GPT5 High to get back what had been lost. That was okay for a few days. Now GPT5 High has dropped to Claude Sonnet 3.5 quality.
There's nothing left to fallback to.
[+] [-] durron|4 months ago|reply
[+] [-] pinkbanana21|4 months ago|reply
Costs are 6x cheaper and it's way faster and good at test writing and tool calling. It some times can be a bit messy though so use Gemini or Claude or codex for that hard problems....
[+] [-] sabareesh|4 months ago|reply
[+] [-] poorman|4 months ago|reply
[+] [-] purnesh|4 months ago|reply
CC on the other hand feels more creative and has mostly given better UI.
Of course, once the page is ready, I switch to Codex to build further.
[+] [-] citizenpaul|4 months ago|reply
[+] [-] simonw|4 months ago|reply
It's really solid. It's effectively a web (and native mobile) UI over Claude Code CLI, more specifically "claude --dangerously-skip-permissions".
Anthropic have recognized that Claude Code where you don't have to approve every step is massively more productive and interesting than the default, so it's worth investing a lot of resources in sandboxing.
[+] [-] brynary|4 months ago|reply
- Claude Code has been added to iOS
- Claude Code on the Web allows for seamless switching to Claude Code CLI
- They have open sourced an OS-native sandboxing system which limits file system and network access _without_ needing containers
However, I find the emphasis on limiting the outbound network access somewhat puzzling because the allowlists invariably include domains like gist.github.com and dozens of others which act effectively as public CMS’es and would still permit exfiltration with just a bit of extra effort.
[+] [-] mdeeks|4 months ago|reply
AI coding should be tightly in the inner dev loop! PRs are a bad way to review and iterate on code. They are a last line of defense, not the primary way to develop.
Give me an isolated environment that is one click hooked up to Cursor/VSCode Remote SSH. It should be the default. I can't think of a single time that Claude or any other AI tool nailed the request on the first try (other than trivial things). I always need to touch it up or at least navigate around and validate it in my IDE.
[+] [-] jackconsidine|4 months ago|reply
Seeing comments like this all over the place. I switched to CC from Cursor in June / July because I saw the same types of comments. I switched from VSCode + Copilot about 8 months before that for the same reason. I remember being skeptical that this sort of thing was guerilla marketing, but CC was in fact better than Cursor. Guess I'll try Codex, and I guess that it's good that there are multiple competing products making big strides.
Never would have imagined myself ditching IDEs and workflows 3x in a few months. A little exhausting
[+] [-] ea016|4 months ago|reply
[0] https://github.com/slopus/happy/
[+] [-] yoavm|4 months ago|reply
It can run in a front-end only mode (I'll put up a hosted version soon), and then you need to specify your OpenCode API server and it'll connect to it. Alternatively, it can spin up the API server itself and proxy it, and then you just need to expose (securely) the server to the internet.
The UI is responsive and my main idea was that I can easily continue directing the AI from my phone, but it's also of course possible to just spin up new sessions. So often I have an idea while I'm away from my keyboard, and being up able to just say "create an X" and let it do its thing while I'm on the go is quite exciting.
It doesn't spin up a special sandbox environment or anything like that, but you're really free to run it inside whatever sandboxing solution you want. And unlike Claude Code, you're of course free to choose whatever model you want.
[0] https://github.com/bjesus/opencode-web
[+] [-] fny|4 months ago|reply
With Happy, I managed to turn one of these Claude Code instances into a replacement for Claude that has all the MCP goodness I could ever want and more.
[0]: https://happy.engineering/
[+] [-] nojs|4 months ago|reply
I was curious how the 'Open in CLI' works - it copies a command to clipboard like 'claude --teleport session_XXXXX', which opens the same chat in the CLI, and checks out a new branch off origin/main which it's created for the thread, called 'claude/feature-name-XXXXX'.
I prefer not to use CC at the 'PR level' because it still needs too much hand-holding, so very happy to see that they've added this.
Update: Session titles are either being leaked between users or have a very bad LLM writing them. I'm seeing "Update Ton Blockchain Configuration" and "Retrieve Current PIN Code" for a project that has nothing to do with blockchain or PIN codes...
[+] [-] Redster|4 months ago|reply
[+] [-] charlesabarnes|4 months ago|reply
[+] [-] teunlao|4 months ago|reply
Claude Code has better UX. Period. The permission system, rollbacks, plan mode - it's more polished. Iterative work feels natural. Quick fixes, exploratory coding, when I'm not sure exactly what I want yet - Claude wins.
Codex is more reliable when stakes are high. Hard problems. Multi-file refactors. Complex business logic. The model just grinds through it. Less hand-holding needed.
Here's the split I've landed on - Claude for fast iteration tasks where I'm actively involved. Codex for delegate-and-walk-away work that needs to be right first time.
Not about which is "better" - wrong question. It's about tooling vs model capability. Claude optimized the wrapper. OpenAI optimized the engine.
[+] [-] pimterry|4 months ago|reply
I'd much rather give it read permissions, have it work in its own clone, and then manually pull changes back through (either with a web review UI somehow, or just pulling the changes locally). Partly for security, partly just to provide a good review gate.
Would also allow using this with other people's repos, where I _can't_ give write permissions, which would be super helpful for exploring dependency repos, or doing more general research. I've found this super helpful with Claude Code locally but seems impossible on the web right now.
[+] [-] jryio|4 months ago|reply
I wonder if a shared Claude Code instance has the same effect?
[+] [-] artdigital|4 months ago|reply
It’s interesting how all the LLMs slowly end up with the same feature set and picking one really ends up with personal preference.
Me as a dev am happy that I now have 4 autonomous engineers that I can delegate stuff to depending on task difficulty and rate limits. Even just Copilot + Codex has made me a lot more productive
Also rip to all the startups that tried to provide “Claude in the cloud”, though this was very predictable to happen
[+] [-] ubj|4 months ago|reply
Just to be clear, I'm excited for the capability to use Claude Code entirely within the browser. However, I've heard reports of Max users experiencing throttled usage limits in recent months, and am concerned as to whether this will exacerbate that issue or not.
[+] [-] martypitt|4 months ago|reply
We're on Gitlab for historic reasons. Where Github now has numerous opporuntities to use AI as part of your workflow, there's nothing in Gitlab (from what I can tell), unless you're paying big bucks.
I like using AI to boost my productivity. I'm surprised that that'll be the thing that makes me migrate to Github.
[+] [-] neilv|4 months ago|reply
I don't want to leak data either way by using some "let's throw SSO from a sketchy adtech company into the trust loop".
I don't want to wait a minute for Anthropic's login-by-email link, and have the process slam the brakes on my workflow and train of thought.
I don't want to wait a minute for OpenAI's MFA-by-email code (even though I disabled that in the account settings, it still did it).
I don't want to deal with desktop clients I don't trust, or that might not keep up with feature improvements. Nor have to kludge up a clumsy virtualization sandbox for an untrusted client, just to ask an LLM questions that could just be in a Web browser.
[+] [-] jngiam1|4 months ago|reply
Pretty cool though! Will need to use it for some more isolated work/code edits. Claude Code is now my workhorse for a ton of stuff including non-coding work (esp. with the right MCPs)
[+] [-] mrcwinn|4 months ago|reply
The only edge Claude has is context window, which we do sometimes hit, but I’m sure that gap will close.