top | item 39627716

(no title)

mikk14 | 2 years ago

I don't know if this is exactly what you are thinking about, but there are some physicists working to understand what happens in transformers: https://proceedings.neurips.cc/paper_files/paper/2023/file/b...

discuss

order

api|2 years ago

Is it really true that we don't really understand why transformers work so well?

I mean we obviously understand how they work at a pure mechanical level, and we have this analogy with lookup (keys, queries, values) and "attention," but do we really get it? Can someone explain to me why that design works so much better than lots of other things like RNNs?

Or did we just tinker a lot (a method known as "graduate student descent") guided by mathematical hunches and loose analogies with biological brains until we found something that kinda worked?

It wouldn't be the first time. AFAIK we got the idea of wings from birds and figured out how to fly with them before we had a really solid fluid mechanical understanding of how and why wings work the way they do. We just thought "hmm so birds fly, so lets try stuff that looks a bit like that..."

ImHereToVote|2 years ago

We really don't have a mathematical theory for large complexity. We are kinda in alchemy stage for this "science".