top | item 45729038

(no title)

DanHulton | 4 months ago

This is incorrect for a lot of reasons, many of which have already been explored, but also:

> with every new iteration of the AI, the internal code will get better

This is a claim that requires proof; it cannot just be asserted as fact. Especially because there's a silent "appreciably" hidden in there between "get" and "better" which has been less and less apparent with each new model. In fact, it more and more looks like "Moore's law for AI" is dead or dying, and we're approaching an upper limit where we'll need to find ways to be properly productive with models only effectively as good as what we already have!

Additionally, there's a relevant adage in computer science: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." If the code being written is already at the frontier capabilities of these models, how the hell are they supposed to fix the bugs that crop up, especially if we can't rely on them getting twice as smart? ("They won't write the bugs in the first place" is not a realistic answer, btw.)

discuss

CuriouslyC|4 months ago

Just because you're not writing code where you can see that the new models are appreciably better doesn't mean they aren't. LLM progress now isn't in making it magically appear smarter at the top end (that's in diminishing returns as you imply), but at filling in weak points in knowledge, holes in capability, improving default process, etc. That's relevant because it turns out most of the time the LLM doesn't fail at coding because it's not a general super genius, but because it just had a hole in its capabilities that caused it to be dumb in a specific scenario.

Additionally, while the intelligence floor is shooting up and the intelligence ceiling is very slowly rising, the models are also getting better at following directions, writing cleaner prose, and their context length support is increasing so they can handle larger systems. The progress is still going strong, it just isn't well represented by top line "IQ" style tests.

LLMs and humans are good at dealing with different kinds of complexity. Humans can deal with messy imperative systems more easily assuming they have some real world intuition about it, whereas LLMs handily beat most humans when working with pure functions. It just so happens that messy imperative systems are bad for a number of reasons, so the fact that LLMs are really good at accelerating functional systems gives them an advantage. Since functional systems are harder to write but easier to reason about and test, this directly addresses the issue of comprehending code.

lugu|4 months ago

The argument they are making is that if a bug is discovered, the agent will not debug it, instead a new test case is created, and the code is regenerated (I suppose if a quick fix isn't found). That is why they don't need debugging agent twice as capable as coding agent. I don't know if this works in practice, as in my experience, tests are intertwined with the code base.