(no title)
dudeinhawaii | 18 days ago
It strains belief that anyone working on a moderate to large project would not have hit the edge cases and issues. Every other day I discover and have to fix a bug that was introduced by Claude/Codex previously (something implement just slightly incorrect or with just a slightly wrong expectation).
Every engineer I know working "mid-to-hard" problems (FANG and FANG adjacent) has broken every LLM including Opus 4.6, Gemini 3 Pro, and GPT-5.2-Codex on routine tasks. Granted the models have a very high success rate nowadays but they fail in strange ways and if you're well versed in your domain, these are easy to spot.
Granted I guess if you're just saying "build this" and using "it runs and looks fine" as the benchmark then OK.
All this is not to say Opus 4.5/6 are bad, not by a long shot, but your statement is difficult to parse as someone who's been coding a very long time and uses these agents daily. They're awesome but myopic.
minimaxir|18 days ago
You might argue I'm No True Engineer because these aren't serious projects but I'd argue most successful uses of agentic coding aren't by FANG coders.
Denzel|18 days ago
I think you and I have different definitions of “one-shotting”. If the model has to be steered, I don’t consider that a one-shot.
And you clearly “broke” the model a few times based on your prompt log where the model was unable to solve the problem given with the spec.
Honestly, your experience in these repos matches my daily experience with these models almost exactly.
I want to see good/interesting work where the model is going off and doing its thing for multiple hours without supervision.