top | item 44333374

(no title)

Vetch | 8 months ago

> he's inability to see its application to modern compute held the field back by years.

I find Schmidhuber's claim on GANs to be tenuous at best, but his claim to have anticipated modern LLMs is very strong, especially if we are going to be awarding nobel prizes for Boltzmann Machines. In https://people.idsia.ch/%7Ejuergen/FKI-147-91ocr.pdf, he really does concretely describe a model that unambiguously anticipated modern attention (technically, either an early form of hypernetworks or a more general form of linear attention, depending on which of its proposed update rules you use).

I also strongly disagree with the idea that his inability to practically apply his ideas held anything back. In the first place, it is uncommon for a discoverer or inventor to immediately grasp all the implications of and applications of their work. Secondly, the key limiter was parallel processing power; it's not a coincidence ANNs took off around the same time GPUs were transitioning away from fixed function pipelines (and Schmidhuber's lab were pioneers there too).

In the interim, when most derided Neural networks, his lab was one of the few that kept research on Neural networks and their application to sequence learning going. Without their contributions, I'm confident Transformers would have happened later.

> It's clear to me no one read his early paper's when developing GANs

This is likely true.

> self-supervision/transformers.

This is not true. Transformers came after lots of research on sequence learners, meta-learning, generalizing RNNs and adaptive alignment. For example, Alex Graves' work on sequence transduction with RNNs eventually led to the direct precursor of modern attention. Graves' work was itself influenced by work with and by Schmidhuber.

discuss

No comments yet.