Whatever the benchmarks might say, there's something about Claude that seems to deliver consistently (although not always perfect) quite reliable outputs across various coding tasks. I wonder what that 'secret sauce' might be and whether GPT-5 has figured it out too.
weego|6 months ago
Yesterday without much promoting Claude 4.1 gave me 10 phases, each with 5-12 tasks that could genuinely be used to kanban out a product step by step.
Claude 3.7 sonnet was effectively the same with fewer granular suggestions for programming strategies.
Gemini 2.5 gave me a one pager back with some trivial bullet points in 3 phases, no tasks at all.
o3 did the same as as Gemini, just less coherent.
Claude just has whatever the thing is for now
unshavedyak|6 months ago
SequoiaHope|6 months ago
concinds|6 months ago
dudeinhawaii|6 months ago
Now, someone will say 'add more tests'. Sure. But that's a bandaid.
I find that the 'smarter' models like Gemini and o3 output better quality code overall and if you can afford to send them the entire context in a non-agentic way .. then they'll generate something dramatically superior to the agentic code artifacts.
That said, sometimes you just want speed to proof a concept and Claude is exceptional there. Unfortunately, proof of concepts often... become productionized rather than developers taking a step back to "do it right".
dagss|6 months ago
atonse|6 months ago
deadbabe|6 months ago
bamboozled|6 months ago