Why AI systems don't learn – On autonomous learning from cognitive science

[+] Animats|15 days ago|reply

Not learning from new input may be a feature. Back in 2016 Microsoft launched one that did, and after one day of talking on Twitter it sounded like 4chan.[1] If all input is believed equally, there's a problem.

Today's locked-down pre-trained models at least have some consistency.

[1] https://www.bbc.com/news/technology-35890188

[+] Earw0rm|15 days ago|reply

Incredible to accomplish that in a day - it took the rest of the world another decade to make Twitter sound like 4chan, but thanks to Elon we got there in the end.

[+] armchairhacker|15 days ago|reply

I think models should be “forked”, and learn from subsets of input and themselves. Furthermore, individuals (or at least small groups) should have their own LLMs.

Sameness is bad for an LLM like it’s bad for a culture or species. Susceptible to the same tricks / memetic viruses / physical viruses, slow degradation (model collapse) and no improvement. I think we should experiment with different models, then take output from the best to train new ones, then repeat, like natural selection.

And sameness is mediocre. LLMs are boring, and in most tasks only almost as good as humans. Giving them the ability to learn may enable them to be “creative” and perform more tasks beyond humans.

[+] vasco|15 days ago|reply

That one 4chan troll delayed the launch of LLM like stuff by Google for about 6 years. At least that's what I attribute it to.

[+] InfiniteLoup|15 days ago|reply

I was always curious about how Tay worked technically, since it was build before the Transformers era.

Was it based on a specific scientific paper or research?

The controversy surrounding it seemed to have polluted any search for a technical breakdown or a discussion, or the insights gained from it.

[+] armoredkitten|15 days ago|reply

Exactly. The notion of online learning is not new, but that approach cedes a lot of control to unknown forces. From a theoretical standpoint, this paper is interesting, there are definitely interesting questions to explore about how we could make an AI that learns autonomously. But in most production contexts, it's not desirable.

Imagine deploying a software product that changes over time in unknown ways -- could be good changes, could be bad, who knows? This goes beyond even making changes to a live system, it's letting the system react to the stream of data coming in and make changes to itself.

It's much preferable to lock down a model that is working well, release that, and then continue efforts to develop something better behind the scenes. It lets you treat it more like a software product with defined versions, release dates, etc., rather than some evolving organism.

[+] shevy-java|15 days ago|reply

> Back in 2016 Microsoft launched one that did, and after one day of talking on Twitter it sounded like 4chan.[1] If all input is believed equally, there's a problem.

Well it shows that most humans degrades into 4chan eventually. AI just learned from that. :)

If aliens ever arrive here, send an AI to greet them. They will think we are totally deranged.

[+] fdghrtbrt|15 days ago|reply

> Not learning from new input may be a feature.

Ugh HN is so tedious with these remarks. These people are trying to get computers to learn, not just train on data, and HN goes nOt LeArNiNg Is A fEaTuRe. Where's the wonder and the curiosity?

[+] bsjshshsb|15 days ago|reply

Yes I like that /clear starts me at zero again and that feels nice but I am scared that'll go away.

Like when Google wasn't personalized so rank 3 for me is rank 3 for you. I like that predictability.

Obviously ignoring temperature but that is kinda ok with me.

[+] nsoonhui|15 days ago|reply

This is an astonishing claim and if true, will make AI a lot less useful in real life scenario.

In real life, take programming as an example, we want Claude to be strong in capability at first, but what is more important is for it to learn our code base, be proficient in it, as it gains experience around it. In other words, become a domain expert.

Because our code base is proprietary I don't expect ( not do I want) the AI to be familiar with it on the first day. So learning on the job is the only way to go.

Only in that way it will resemble a human programmer, and only then we can truly talk about replacing human programmer.

[+] rstuart4133|15 days ago|reply

> Not learning from new input may be a feature.

Learning is OpenClaw's distinguishing feature. It has an array of plugins that let it talk to various services - but lots of LLM applications have that.

What makes it unique is it's memory architecture. It saves everything it sees and does. Unlike an LLM context its memory never overflows. It can search for relevant bits on request. It's recall is nowhere near as well as the attention heads of an LLM, but apparently good enough to make a difference. Save + Recall == memory.

[+] moffkalast|15 days ago|reply

Yeah deep learning treats any training data as the absolute god given ground truth and will completely restructure the model to fit the dumbest shit you feed it.

The first LLMs were utter crap because of that, but once you have just one that's good enough it can be used for dataset filtering and everything gets exponentially better once the data is self consistent enough for there to be non-contradictory patterns to learn that don't ruin the gradient.

[+] theptip|15 days ago|reply

It’s interesting, LeCun seems to have a blind spot around in-context learning. I didn’t find one mention in this paper (only skimmed the full paper so far so may have missed), which is odd as it is the way that agents come closest to autonomous learning in the real world.

I would say his core point does still apply; autonomous learning is not solved by ICL. But it seems a strawman to ignore the topic entirely and focus on training.

From what I see on the ground, some degree of autonomous learning is possible; Agents can already be set up to use meta-learning skills for skill authoring, introspection, rumination, etc - but these loops are not very effective currently.

I wonder if this is the myopic viewpoint of a scientist who doesn’t engage with the engineering of how these systems are actually used in the real world (ie “my work is done once Llama is released with X score on Y eval”) which results in a markedly different stance than the guys like Sutskever, Karpathy, Amodei who have built end-to-end systems and optimized for customer/business outcomes.

[+] zhangchen|16 days ago|reply

Has anyone tried implementing something like System M's meta-control switching in practice? Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.

[+] robot-wrangler|16 days ago|reply

> Curious how you'd handle the reward signal for deciding when to switch between observation and active exploration without it collapsing into one mode.

If you like biomimetic approaches to computer science, there's evidence that we want something besides neural networks. Whether we call such secondary systems emotions, hormones, or whatnot doesn't really matter much if the dynamics are useful. It seems at least possible that studying alignment-related topics is going to get us closer than any perspective that's purely focused on learning. Coincidentally quanta is on some related topics today: https://www.quantamagazine.org/once-thought-to-support-neuro...

[+] claud_ia|15 days ago|reply

[deleted]

[+] aanet|16 days ago|reply

by Emmanuel Dupoux, Yann LeCun, Jitendra Malik

"he proposed framework integrates learning from observation (System A) and learning from active behavior (System B) while flexibly switching between these learning modes as a function of internally generated meta-control signals (System M). We discuss how this could be built by taking inspiration on how organisms adapt to real-world, dynamic environments across evolutionary and developmental timescales. "

[+] iFire|16 days ago|reply

https://github.com/plastic-labs/honcho has the idea of one sided observations for RAG.

[+] dasil003|16 days ago|reply

If this was done well in a way that was productive for corporate work, I suspect the AI would engage in Machievelian maneuvering and deception that would make typical sociopathic CEOs look like Mister Rogers in comparison. And I'm not sure our legal and social structures have the capacity to absorb that without very very bad things happening.

[+] logicchains|15 days ago|reply

There's already a model capable of autonomous learning on the small scale, just nobody's tried to scale it up yet: https://arxiv.org/abs/2202.05780

[+] jdkee|16 days ago|reply

LeCun has been talking about his JEPA models for awhile.

https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/

[+] Xunjin|15 days ago|reply

In this podcast episode[0] he does talk about this kind of model and how it "learns about physics" through experience instead of just ingesting theorical material.

It's quite eye opening.

0. https://youtu.be/qvNCVYkHKfg

[+] shevy-java|15 days ago|reply

The whole AI field is a misnomer. It stole so much from neurobiology.

However had, there will come a time when AI will really learn. My prediction is that it will come with a different hardware; you already see huge strides here with regards to synthetic biology. While this focuses more on biology still, you'll eventually see a bridging effort; cyborg novels paved the way. Once you have real hardware that can learn, you'll also have real intelligence in AI too.

[+] utopiah|15 days ago|reply

I remember a joke from few years ago that was showing an "AI" that was "learning" on its "own" which meant periodically starting from scratch with a new training set curated by a large team of researchers themselves relying on huge teams (far away) of annotators.

TL;DR: depends where you defined the boundaries of your "system".

[+] p_v_doom|15 days ago|reply

I think from a proper systemic view that joke is more correct than not. AI is just the frontend of people ...

[+] krinne|15 days ago|reply

But doesnt existing AI systems already learn in some way ? Like the training steps are actually the AI learning already. If you have your training material being setup by something like claude code, then it kind of is already autonomous learning.

[+] LovelyButterfly|15 days ago|reply

Most, if not all, commercially available AI models are doing offline learning. The cognition is a skill that is only possible on online learning which is the autonomous part the authors refer to, that is, learning by observing, interacting.

In that sense the "autonomous" part you said simply meant that the data source is coming from a different place, but the model itself is not free to explore with a knowledge base to deduce from, but rather infer on what is provided to it.

[+] imtringued|15 days ago|reply

If you let the AI train on your prompts it will actually learn indirectly. It is still offline learning though.

[+] beernet|16 days ago|reply

The paper's critique of the 'data wall' and language-centrism is spot on. We’ve been treating AI training like an assembly line where the machine is passive, and then we wonder why it fails in non-stationary environments. It’s the ultimate 'padded room' architecture: the model is isolated from reality and relies on human-curated data to even function.

The proposed System M (Meta-control) is a nice theoretical fix, but the implementation is where the wheels usually come off. Integrating observation (A) and action (B) sounds great until the agent starts hallucinating its own feedback loops. Unless we can move away from this 'outsourced learning' where humans have to fix every domain mismatch, we're just building increasingly expensive parrots. I’m skeptical if 'bilevel optimization' is enough to bridge that gap or if we’re just adding another layer of complexity to a fundamentally limited transformer architecture.

[+] est|15 days ago|reply

"don't learn" might be a good feature from a business point of view

Imagine if AI learns all your source code and apply them to your competitor /facepalm

[+] tranchms|16 days ago|reply

We are rediscovering Cybernetics

[+] himata4113|15 days ago|reply

Eh, honestly? We're not that far away from models training themselves (opus 4.6 and codex 5.3 were both 'instrumental' in training themselves).

They're capable enough to put themselves in a loop and create improvement which often includes processing new learnings from bruteforcing. It's not in real-time, but that probably a good thing if anyone remembers microsofts twitter attempt.

[+] tim333|15 days ago|reply

I was thinking in the same way that the human brain's design came about from evolutionary trial and error, we may be close to a situation where we can do something like that for the artificial neural networks and have the computers improve them by fiddling about.

[+] Garlef|15 days ago|reply

I think restrcicting this discussion to LLMs - as it is often done - misses the point: LLMs + harnesses can actually learn.

117 comments