top | item 45696688

(no title)

Mehvix | 4 months ago

>None of that is specialized to run only transformers at this point

isn't this what [etched](https://www.etched.com/) is doing?

discuss

order

imtringued|4 months ago

Only being able to run transformers is a silly concept, because attention consists of two matrix multiplications, which are the standard operation in feed forward and convolutional layers. Basically, you get transformers for free.

kadushka|4 months ago

devil is in the details