top | item 35633624

(no title)

Analog24 | 2 years ago

"neat future" is very ambiguous. At the moment there is nothing even close to transformers in terms of performance. I suspect you are right in general but I'm not sure about the "near future" part, there needs to be a pretty significant paradigm shift for that to happen (which is possible, of course, I just don't see any hints of it yet).

discuss

mkaic|2 years ago

RWKV is an attention-free architecture that's showing promising scaling at a similar level to Transformers right now! There's also recently been Hyena, which uses a new mechanism that's kind of a weird mix of attention, convolution, and implicit modelling all at once. It's shown promise as well. Remains to be seen if these competing methods will truly scale as well as Transformers, but I've got my fingers crossed. Only a matter of time!

I agree that "near future" is quite ambiguous though. If I were to disambiguate my claims, I think I'd personally expect a Transformer-killing architecture to arise in the next 4-5 years.