(no title)
sdrg822 | 2 years ago
""" One may ask whether the conditionality introduced by the use of CMM does not make FFFs incompatible with the processes and hardware already in place for dense matrix multiplication and deep learning more broadly. In short, the answer is “No, it does not, save for some increased caching complexity." """
It's hard to beat the hardware lottery!
algo_trader|2 years ago
> We therefore leave the attention layers untouched
Meaning, presumably, that the GPU memory remains the bottleneck
Flops really are quite cheap by now, e.g. vision inference chip ~$2/teraflop/s !!
marcinzm|2 years ago
ashirviskas|2 years ago
I'm really interested, can you share where you got these numbers?
theGnuMe|2 years ago
YetAnotherNick|2 years ago
H100 is basically ~$2/(2000 tflops/s)/hour or $1 for 4*10^18 floating point operations.
unknown|2 years ago
[deleted]