top | item 46321028

(no title)

sinatra | 2 months ago

Piggybacking on this post. Codex is not only finding much higher quality issues, it’s also writing code that usually doesn’t leave quality issues behind. Claude is much faster but it definitely leaves serious quality issues behind.

So much so that now I rely completely on Codex for code reviews and actual coding. I will pick higher quality over speed every day. Please don’t change it, OpenAI team!

discuss

F7F7F7|2 months ago

Every plan Opus creates in Planning mode gets run through ChatGPT 5.2. It catches at least 3 or 4 serious issues that Claude didn’t think of. It typically takes 2 or 3 back and fourths for Claude to ultimately get it right.

I’m in Claude Code so often (x20 Max) and I’m so comfortable with my environment setup with hooks (for guardrails and context) that I haven’t given Codex a serious shot yet.

SkyPuncher|2 months ago

The same thing can be said about Opus running through Opus.

It's often not that a different model is better (well, it still has to be a good model). It's that the different chat has a different objective - and will identify different things.

derfurth|2 months ago

Thanks for the tip. I was dubious, I tried GPT 5.2 for a start on a large plan and it was way better than reviewing it with Claude itself or Gemini. I then used it to help me with feature I was reviewing, it caught real discrepancies between the plan and the actual integration!

lostmsu|2 months ago

This makes me think: are there any "pair-programming" vibecoding tools that would use two different models and have them check each other?

AmazingTurtle|2 months ago

Have you tried telling Claude not to leave serious quality issues behind?