top | item 46860237

(no title)

These paid offerings geared toward software development must be a hell of a lot "smarter" than the regular chatbots. The amount of nonsense and bad or outright wrong code Gemini and ChatGPT throw at me lately is off the charts. I feel like they are getting dumber.

discuss

ghosty141|27 days ago

Yes they are, the fact that the agents have full access to your local project files makes a gigantic difference.

They do *very* well at things like: "Explain what this class does" or "Find the biggest pain points of the project architecture".

No comparison to regular ChatGPT when it comes to software development. I suggest trying it out, and not by saying "implement game" but rather try it by giving it clear scoped tasks where the AI doesn't have to think or abstract/generalize. So as some kind of code-monkey.

zitterbewegung|27 days ago

I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved. I prefer Claude code right now because it’s a better product . Gemini just has a weird context window that poisons the rest of the code generated (when online) ChatGPT Codex vs Claude I feel that Claude is a better product and I don’t use enough tokens to for Claude Pro at $100 and just have a regular ChatGPT subscription for productivity tasks .

nkohari|27 days ago

> I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved.

I think it's clear now that the pace of model improvements is asymptotic (or at least it's reached a local maxima) and the model itself provides no moat. (Every few weeks last year, the perception of "the best model" changed, based on basically nothing other than random vibes and hearsay.)

As a result, the labs are starting to focus on vertical integration (that is, building up the product stack) to deepen their moat.

mceachen|27 days ago

It’s the inconsistency that gets me. Very similar tasks, similar complexity, same code base, same prompting:

Session A knocks it out of the park. Chef’s kiss.

Session B just does some random vandalism.