top | item 27090903

(no title)

thesehands | 4 years ago

Transformers suffer from a quadratic bottleneck when calculating attention. Much work has been done investigating where memory can be saved by being more explicit on which attentions to calculate. This repo implements transformers with noted improvements

discuss

order

No comments yet.