top | item 46472338

(no title)

fooblaster | 1 month ago

Show me a single FPGA that can outperform a B200 at matrix multiplication (or even come close) at any usable precision.

B200 can do 10 peta ops at fp8, theoretically.

I do agree memory bandwidth is also a problem for most FPGA setups, but xilinx ships HBM with some skus and they are not competitive at inference as far as I know.

discuss

order

checker659|1 month ago

Said GPUs spend half the time just waiting for memory.

fooblaster|1 month ago

Yep, but they are still 50x faster than any fpga.