top | item 47109436

(no title)

menaerus | 7 days ago

Not sure what you're on but I think what you said is incorrect. You can use hi-density HBM-enabled FPGA with (LP)DDR5 with sufficient number of logic elements to implement the inference. Reason why we don't see it in action is most likely in the fact that such FPGAs are insanely expensive and not so available off-the-shelf as the GPUs are.

discuss

wmf|7 days ago

Yeah, FPGA+HBM works but it has no advantage over GPU+HBM. If you want to store weights in FPGA LUTs/SRAM for insane speed you're going to need a lot of FPGAs because each one has very little capacity.

menaerus|6 days ago

Ok, then I may have misunderstood what you were saying. If the only thing we are interested is to store all the weights into the block RAM or LUTs then, yeah, that wouldn't be possible. I understood the OPs question a bit differently too.