"Based on these public on-demand quoted prices from AWS and IDC, we found that the IntelR GaudiR 2 has the best training performance-per-dollar, with an average advantage of 4.8x vs the NVIDIA A100-80GB, 4.2x vs. the NVIDIA A100-40GB, and 5.19x vs. the NVIDIA H100"
Seems there's some friction in porting software as you have to use their build of pytorch. They claim you just have to change your specified device in `.to(device:str)` statements but, if someone could verify that it would be appreciated. My experience with porting software to Google's TPU's or AMD GPU's has been not great.
I looked in their Intel Developer Cloud and saw the $10.42/hr 8x but there is no individual 1x Gaudi 2 there that I could see. The $1.30/hr could be okay for some inference use case though if it were available. Although for what I was thinking, llama.cpp is not going to work anyway.
Kinda funny that instead of NVLink, they're just using (presumably standard) 100GbE as their connector/protocol; wonder if this also lets you wire up larger and more complex topologies of these cards across servers using normal 100GbE switches
sailplease|2 years ago
ShamelessC|2 years ago
ilaksh|2 years ago
remexre|2 years ago