top | item 38378424

(no title)

sdrg822 | 2 years ago

Cool. Important note:

""" One may ask whether the conditionality introduced by the use of CMM does not make FFFs incompatible with the processes and hardware already in place for dense matrix multiplication and deep learning more broadly. In short, the answer is “No, it does not, save for some increased caching complexity." """

It's hard to beat the hardware lottery!

discuss

order

algo_trader|2 years ago

Infact, as stated in the paper, this is bad news

> We therefore leave the attention layers untouched

Meaning, presumably, that the GPU memory remains the bottleneck

Flops really are quite cheap by now, e.g. vision inference chip ~$2/teraflop/s !!

marcinzm|2 years ago

Bottleneck for larger models however this would presumably allow for cheaper models at scale or on compute constrained devices (like phones).

ashirviskas|2 years ago

>Flops really are quite cheap by now, e.g. vision inference chip ~$2/teraflop/s !!

I'm really interested, can you share where you got these numbers?

theGnuMe|2 years ago

There's another paper replacing attention with FF networks so just combine the two and you've got something.

YetAnotherNick|2 years ago

> ~$2/teraflop/s

H100 is basically ~$2/(2000 tflops/s)/hour or $1 for 4*10^18 floating point operations.