top | item 27090903

(no title)

Transformers suffer from a quadratic bottleneck when calculating attention. Much work has been done investigating where memory can be saved by being more explicit on which attentions to calculate. This repo implements transformers with noted improvements

discuss

No comments yet.