top | item 46803517

(no title)

manbitesdog | 1 month ago

With such a high throughput because of sparsity, I'm particulary interested in distilling it into other architectures. I'd like to try a recurrent transformer when I have the time

discuss

order

No comments yet.