"Based on these public on-demand quoted prices from AWS and IDC, we found that the IntelR GaudiR 2 has the best training performance-per-dollar, with an average advantage of 4.8x vs the NVIDIA A100-80GB, 4.2x vs. the NVIDIA A100-40GB, and 5.19x vs. the NVIDIA H100"
ShamelessC|2 years ago
kkielhofner|2 years ago
However, as you note many of these implementations (Intel, AMD, Google TPU, etc) are more or less at the “get PyTorch to kind of work” stage.
I don’t know of many/any real world applications that are “vanilla” PyTorch at this point.
Stuff like Flash Attention (2), HF accelerate/optimum, distributed training implementations, Deepspeed, custom CUDA kernels all over the place, TensorRT, PyTorch 2 compile, SPDA, serving frameworks, etc. The software stacks and resulting functionality, usability, and performance CUDA “owns” are truly endless.
Any real project or implementation I’ve touched in the last year is so intertwined and dependent on CUDA it’s mind blowing and the CUDA lead is only increasing.
With AMD/ROCm as one example when you finally kind of get things to sort of work even though the hardware is potentially competitive on paper the software ecosystem is so far behind you’re happy to pay the “Nvidia tax” because not only is CUDA significantly smoother overall the endless stacks of optimized software implementations for CUDA make any allegedly comparable implementations run at a fraction of the speed while also burning dev time left and right.
Love or hate Nvidia the 15 year investment and dominance of CUDA is very apparent to anyone who’s actually working with this stuff and just trying get something done.
Again, as you note it’s interesting to watch observers/casual users claim these implementations are competitive because in my experience you get even one level deeper and it’s a complete nightmare. I try ROCm every couple of months and end up laughing and/or shaking my head at just how far behind it is (after six years).
I’m really rooting for them but the reality is these CUDA “competitors” have a very very long way to go.
ilaksh|2 years ago