top | item 34727305

(no title)

LukeB42 | 3 years ago

Can you explain like I'm 5 why this matters distinctly from how transformers are normally trained with autodiff and what its possible applications are?

discuss

order

adamnemecek|3 years ago

I’m talking about attention only transformers. Those don’t have an autodiff but still learn. The math is actually really cool.

lostmsu|3 years ago

> attention only transformers

Can you share any good link on the subject?