top | item 47204881

(no title)

KellyCriterion | 15 hours ago

> there really is no moat.

For ChatGPT and Gemini, yes.

But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:

1. It changed without noticing things like "Touple.First.Date.Created" and "Touple.Second.Date.Created" and it rendered the code unworking by chaning to "Touple.FirstDate" and "Touple.SecondDate"

2. There was a const list of 12 definitions for a given context, when telling to rewrite the function it just cut 6 of these 12 definitions, making the code not compiling - I asked why they were cut: "Sorry, I was just too lazy typing" ?? LOL

3. There is a list include holding some items "_allGlobalItems" - it changed the name in the function simply to "_items", code didnt compile

As said, a working version of a similar function was given upfront.

With Claude, I never have such issues.

discuss

order

ptnpzwqd|14 hours ago

I have used Claude (incl. Opus 4.6) fairly extensively, and Claude still spits out quality that is far below what I would call production ready - both littered with smaller issues, but also the occasional larger blunder. Particularly when doing anything non-trivial, and even when guiding it in detail (although that admittedly reduces the amount of larger structural issues).

Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).

KellyCriterion|12 hours ago

> C#/.NET

same here :)

> one-shots a larger/complex C# feature

I can show you a timeseries data-renderer which was created with 1 initial very large prompt and then 3 following "change this and that" prompts. The file is around 5000 lines and everything works fine & exactly as specified.

skeledrew|9 hours ago

I don't get it though. Why do you expect perfect responses? Humans continually make mistakes, and AI is trained on human data. Yet there seems to be this higher bar of expectation for the latter. Somehow people expect this thing that's been around for a few weeks/months, and cannot learn anything more beyond its training cutoff date, to always do a better job than a human who's been around for 20+ years and is able to learn on their own until death.

peteforde|13 hours ago

I see this over and over again. I don't dispute your experience. My experience with ESP32 development has been unreasonably positive. My codebase is sitting around 600k LoC and is the product of several hundred Opus 4.x Plan -> Agent -> Debug loops. I review everything that goes through, but I'm reviewing the business logic and domain gotchas, not dumb crap like what you and so many others describe.

What is so strange to me is that surely there is more C# out there than ESP-IDF code? I don't have a good explanation beyond saying that my codebase is extensively tested and used; I would know very quickly if it suddenly started shitting the bed in the way you explain.

je42|14 hours ago

Interesting - what kind of structural issues have you encountered?

Is these more related to the existing source code or is this a bad pattern thar you would never do regardless of the existing code?

AlecSchueler|14 hours ago

> Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:

One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.

That said I do personally feel Claude to produce far better results than competitors.

piva00|10 hours ago

> One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.

In my experience working in a large codebase with a good set of standards that's not the case, I can supply examples already existing in the codebase for Claude to use as a guidance and it generates quite decent code.

I think it's because there's already a lot of decent code for it to slurp and derive from, good quality tests at the functional level (so regressions are caught quickly).

I do understand though that on codebases with a hodge podge of styles, varying quality of tests, etc. it probably doesn't work as well as in my experience but I'm quite impressed about how I can do the thinking, add relevant sections of the code to the context (including protocols, APIs, etc.), describe what I need to be done, and get a plan back that most times is correct or very close to correct, which I can then iterate over to fix gaps/mistakes it made, and get it implemented.

Of course, there are still tasks it fails and I don't like doing multiple iterations to correct course, for those I do them manually with the odd usage here and there to refactor bits and pieces.

Overall I believe if your codebase was already healthy you can have LLMs work quite well with pre-existing code.

jacquesm|13 hours ago

> One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code.

Don't we all?

ivan_gammel|13 hours ago

Greenfield implementation is not flawless as well.

ben_w|15 hours ago

That's been my experience too. I'm using the recent free trial of OpenAI Plus to vibe code, and from this I would say that if Claude Code is a junior with 1-3 years of experience, OpenAI's Codex is like a student coder.

Oreb|14 hours ago

Does it depend on what type of programming you do? Doing Swift/SwiftUI work, I have exactly the opposite experience. I’ve been using both recently, and I want to use Claude alone (especially after the last week’s events), but Codex is just so much faster and better.

otabdeveloper4|13 hours ago

> Its the only model that gets production ready output on the first detailled prompt.

That's, just, like, your opinion, man.

KellyCriterion|12 hours ago

...and of a lot of colleagues in and out of my sector :)

littlestymaar|15 hours ago

> But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled promp

That's not a moat though. Claude itself wasn't there 6 months ago and there's no reason to think Chinese open models won't be at this level in a year at most.

To keep its current position Claude has to keep improving at the same pace as the competitor.

jccx70|9 hours ago

[deleted]