top | item 34727305 (no title) LukeB42 | 3 years ago Can you explain like I'm 5 why this matters distinctly from how transformers are normally trained with autodiff and what its possible applications are? discuss order hn newest adamnemecek|3 years ago I’m talking about attention only transformers. Those don’t have an autodiff but still learn. The math is actually really cool. lostmsu|3 years ago > attention only transformersCan you share any good link on the subject? load replies (1)
adamnemecek|3 years ago I’m talking about attention only transformers. Those don’t have an autodiff but still learn. The math is actually really cool. lostmsu|3 years ago > attention only transformersCan you share any good link on the subject? load replies (1)
lostmsu|3 years ago > attention only transformersCan you share any good link on the subject? load replies (1)
adamnemecek|3 years ago
lostmsu|3 years ago
Can you share any good link on the subject?