top | item 38738434

(no title)

RNNs and LSTMs from the past did this as well (but cannot be trained in parallel as each token has to be compressed sequentially). Transformers ate their cake.

Newer methods are going back to similar concepts but trying to get past previous bottlenecks given what we've learned since then about transformers.

discuss

No comments yet.