From a high-level design standpoint, wouldn’t the general-purposeness of NVIDIA’s GPUs (even if they do have some AI/LLM optimizations) put them generally at a disadvantage compared to more custom/dedicated inference designs? (Disregarding real-world issues like startup execution risks, assume competitors succeed at their engineering goals) Or is there some fundamental architectural reason why NVIDIA can/will always be highly competitive in AI inference? Is the general-purposeness of the GPU not as much of an overhead/disadvantage as it seems?Also how critical is NVIDIA’s infiniband networking advantage when it comes to inference workloads?
p1esk|2 years ago
zdyn5|2 years ago