Sounds like you are using ChatGPT to spit out a script in the chat? - if so, you should give 5.2 codex or Claude Code with Opus 4.5 a try... it's night and day.
I don’t think so. My favorite tool is Codex with the 5.2-codex model. I use Github Copilot and Codex at work and Codex and Cursor at home. Codex is better for harder and bigger tasks. I’ll use Copilot or Cursor for small easy things. I think Codex is better than Claude Code as well.
I have GH Copilot from work and a personal Claude Code max subscription and have noticed a difference in quality if I feed the same input prompts/requirements/spec/rules.md to Claude Code cli and GH Copilot, both using Opus 4.5, where Claude Code CLI gives better results.
Maybe there's more going on at inference time with Claude Code cli?
I find this really frustrating and confusing about all of the coding models. These models are all ostensibly similar in their underpinnings and their basic methods of operation, right?
So, why does it feel all so fragile and like a gacha game?
In this case they probably are prompting it "wrong" or at least less well than codex/copilot/claude code/etc. That's not a criticism of the user, it's an indication of the fact that people have put a lot of work into the special case of using these particular tools and making sure they are prompted well with context etc whereas when you just type something into chat you would need to replicate that effort yourself in your own prompt.
ignoramous|1 month ago
Is using these same models but with GitHub Copilot or Replit equally capable as / comparable to using the respective first-party CLIs?
ggrantrowberry|1 month ago
discordance|1 month ago
Maybe there's more going on at inference time with Claude Code cli?
Eufrat|1 month ago
So, why does it feel all so fragile and like a gacha game?
FergusArgyll|1 month ago
davidmurdoch|1 month ago
usefulposter|1 month ago
seanhunter|1 month ago