top | item 44910353

(no title)

kleyd | 6 months ago

Current LLMs look a lot like a very advanced 'old brain' to me. While context engineering looks like optimizing the working memory.

What's missing is a part with more plasticity that can work in parallel and bi-directionally interact with the current static models in real-time.

This would mean individually trained models based on their experience so that knowledge is not translated to context, but to weight adjustments.

discuss

movpasd|6 months ago

That's also my view. It's clear that these models are more than pure language algorithms. Somewhere within the hidden layers are real, effective working models of how the world works. But the power of real humans is the ability to learn on-the-fly.

Disclaimer: These are my not-terribly-informed layperson's thoughts :^)

The attention mechanism does seem to give us a certain adaptability (especially in the context of research showing chain-of-thought "hidden reasoning") but I'm not sure that it's enough.

Thing is, earlier language models used recurrent units that would be able to store intermediate data, which would give more of a foothold for these kind of on-the-fly adjustments. And here is where the theory hits the brick wall of engineering. Transformers are not just a pure machine learning innovation, the key is that they are massively scalable, and my understand is part of this comes from the _lack_ of recurrence.

I guess this is where the interest in foundation models comes from. If you could take a codebase as a whole and turn it into effective training data to adjust the weights of an existing, more broadly-trained model, But is this possible with a single codebase's worth of data?

Here again we see the power of human intelligence at work: the ability to quite consciously develop new mental models even given very little data. I imagine this is made possible by leaning on very general internal world-models that let us predict the outcomes of even quite complex unseen ("out-of-distribution") situations, and that gives us extra data. It's what we experience as the frustrations and difficulties of the learning process.