top | item 44410677

(no title)

nmadden | 8 months ago

The improvements in programming are largely due to the adoption of “agentic” architectures. This is really a hybrid neural-symbolic approach: the symbolic part being the interpreter/compiler. Effectively the LLM still produces an almost-correct-but-wrong program and then the compiler “fact-checks” it and then the LLM basically local-searches its way from there to something that passes the compiler. (If you want to be disabused of the idea that LLMs on their own are good at programming, just review the “reasoning” log of one trying to fix a simple string | undefined error in Typescript).

It seems clear to me therefore that further improvements in programming ability will not come from better LLM models (which have not really improved much), but from better integration of more advanced compilers. That is, the more types of errors that can be caught by the compiler, the better chance of the AI fuzzing its way to a good overall solution. Interestingly, I hear anecdotally that current LLMs are not great at writing Rust, which does have an advanced type system able to capture more types of errors. That’s where I’d focus if I was working on this. But we should be clear that the improvements are already largely coming via symbolic means, not better LLMs.

I wrote some notes about a year ago about the irony of LLMs being considered a refutation of GOFAI when they are actually now firmly recapitulating that paradigm: https://neilmadden.blog/2024/06/30/machine-learning-and-the-...

discuss

NitpickLawyer|8 months ago

> The improvements in programming are largely due to the adoption of “agentic” architectures.

Yes, I agree. But it's not just the cradles, it's cradles + training on traces produced with those cradles. You can test this very easily with running old models w/ new cradles. They don't perform well at all. (one of the first things I did when guidance, a guided generation framework, launched ~2 years ago was to test code - compile - edit loops. There were signs of it working, but nothing compared to what we see today. That had to be trained into the models.)

> will not come from better LLM models (which have not really improved much), but from better integration of more advanced compilers.

Strong disagree. They have to work together. This is basically why RL is gaining a lot of traction in this space.

Also disagree on llms not improving much. Whatever they did with gemini 2.5 feels like gpt3-4 to me. The context updates are huge. This is the first model that can take 100k tokens and still work after that. They're doing something right to be able to support such large contexts with such good performance. I'd be surprised if gemini 2.5 is just gemini 1 + more data. Extremely surprised. There have to be architecture changes and improvements somewhere in there.

daveguy|8 months ago

> You can test this very easily with running old models w/ new cradles. They don't perform well at all.

This is because neither the LLMs nor the cradles are intelligent.

> They have to work together.

Exactly. Because they are essentially a single, brittle model. Not a "smart" text generator + a "smart" validation system.

LLMs are an enormous breakthrough in NLP and something like it will be part of an AGI system. But there is no path to AGI without more breakthroughs.