top | item 39507053

(no title)

zdyn5 | 2 years ago

From a high-level design standpoint, wouldn’t the general-purposeness of NVIDIA’s GPUs (even if they do have some AI/LLM optimizations) put them generally at a disadvantage compared to more custom/dedicated inference designs? (Disregarding real-world issues like startup execution risks, assume competitors succeed at their engineering goals) Or is there some fundamental architectural reason why NVIDIA can/will always be highly competitive in AI inference? Is the general-purposeness of the GPU not as much of an overhead/disadvantage as it seems?

Also how critical is NVIDIA’s infiniband networking advantage when it comes to inference workloads?

discuss

p1esk|2 years ago

Custom chips have to be much better than Nvidia to become attractive. Being 2x faster won’t be enough, 5x faster might be. Assuming perfectly functioning software.

zdyn5|2 years ago

Is software that important on the inference side, assuming all the key ops are supported by the compiler? Once the model is quantized and frozen the deployment to alternative chips while somewhat cumbersome hasn’t been too challenging, at least in my experience with Qualcomm NPU deployment (trained on NVIDIA)