ROCm really is hit or miss depending on the use case.
Plus their consumer card support is questionable to say the least. I really wish it was a viable alternative, but swapping to CUDA really saved me some headaches and a ton or time.
Having to run MiOpen benchmarks for HIP can take forever.
Exactly the same has been said over and over again, ever since CUDA took off for scientific computing around 2010. I don’t really understand why 15 years later AMD still hasn’t been able to copy the recipy, and frankly it may be too late now with all that mindshare in NVIDIA’s software stack.
Jensen knows what he is doing with the CUDA stack and workstations. AMD needs to beat that more than thinking about bigger hardware. Most people are not going to risk years learning an arcane stack for an architecture that is used by less than 10% of the GPGPU market.
I'm willing to bet almost nobody you know calls the CUDA API directly. What AMD needs to focus on is getting the ROCm backend going for XLA and PyTorch. That would unlock a big slice of the market right there.
They should also be dropping free AMD GPUs off helicopters, as Nvidia did a decade or so ago, in order to build up an academic userbase. Academia is getting totally squeezed by industry when it comes to AI compute. We're mostly running on hardware that's 2 or 3 generations out of date. If AMD came with a well supported GPU that cost half what an A100 sells for, voila you'd have cohort after cohort of grad students training models on AMD and then taking that know-how into industry.
Additionally when people discuss CUDA they always think about C, ignoring that has been a C++ first since CUDA 3.0, also has Fortran surpport, and NVidia always embraced having multiple languages being able to play on PTX land as well.
And as of 2025, there is a Python CUDA JIT DSL as well.
Also, even if not the very latest version, the fact that CUDA SDK works on any consumer laptop with NVidia hardware, anyone can slowly get into CUDA, even if their hardware isn't that great.
At this point it looks to me like something is seriously broken internally at AMD resulting in their software stack being lacklustre. They’ve had a lot of time to talk to customers about their problems and spin up new teams, but as far as I’ve heard there’s been very little progress, despite the enormous incentives.
I think Lisa Su is a great CEO but perhaps not shaking things up enough in the software department. She is from a hardware background after all.
Can someone with more knowledge give me a software overview of what AMD is offering?
Which SDKs do they offer that can do neural network inference and/or training?
I'm just asking because I looked into this a while ago and felt a bit overwhelmed by the number of options. It feels like AMD is trying many things at the same time, and I’m not sure where they’re going with all of it.
fyi: ROCm support status currently isn't crucial for casual AI users - standard proprietary AMD drivers include Vulkan API support going back ~10 years. It's slower, but llama.cpp supports it, and so do many oneclick automagic LLM apps like LM Studio.
Honestly that was a hard read. I hope that guy gets an mi355 just for writing this.
AMD deserves exactly zero of the credulity this writer heaps onto them. They just spent four months not supporting their rdna4 lineup in rocm after launch. AMD is functionally capable of day120 support. None of the benchmarks disambiguated where the performance is coming from. 100% they are lying on some level, representing their fp4 performance against fp 8/16.
I still find their delay with properly investing in ROCm on client to be rather shocking, but in fairness they did finally announce that they would be supporting client cards on day 1[1]. Of course, AMD has to keep the promise for it to matter, but they really do seem to, for whatever reason, finally realized just how important it is that ROCm is well-supported across their entire stack (among many other investments they've announced recently.)
It's baffling that AMD is the same company that makes both Ryzen and Radeon, but the year-to-date for Radeon has been very good, aside from the official ROCm support for RDNA4 taking far too long. I wouldn't get overly optimistic; even if AMD finally committed hard to ROCm and Radeon it doesn't mean they'll be able to compete effectively against NVIDIA, but the consumer showing wasn't so bad so far with the 9070 XT and FSR4, so I'm cautiously optimistic they've decided to try to miss some opportunities to miss opportunities. Let's see how long these promises last... Maybe longer than a Threadripper socket, if we're lucky :)
Last year I had issues using MI300X for training, and when it did work, was about 20-30% slower than H100, but I'm doing some OpenRLHF (transformers/DeepSpeed-based) DPO training atm w/ latest ROCm and PyTorch and it seems to be doing OK, roughly matching GPU-hour perf w/ an H200 for small ~12h runs.
Note: previous testing I did was on a single (8x) MI300X node, currently I'm doing testing on just a single MI300X GPU, so not quite apples-to-apples, multi-GPU/multi-node training is still a question mark, just a single data point.
[+] [-] Minks|9 months ago|reply
Plus their consumer card support is questionable to say the least. I really wish it was a viable alternative, but swapping to CUDA really saved me some headaches and a ton or time.
Having to run MiOpen benchmarks for HIP can take forever.
[+] [-] m_mueller|9 months ago|reply
[+] [-] alecco|9 months ago|reply
[+] [-] hyperbovine|9 months ago|reply
They should also be dropping free AMD GPUs off helicopters, as Nvidia did a decade or so ago, in order to build up an academic userbase. Academia is getting totally squeezed by industry when it comes to AI compute. We're mostly running on hardware that's 2 or 3 generations out of date. If AMD came with a well supported GPU that cost half what an A100 sells for, voila you'd have cohort after cohort of grad students training models on AMD and then taking that know-how into industry.
[+] [-] pjmlp|9 months ago|reply
And as of 2025, there is a Python CUDA JIT DSL as well.
Also, even if not the very latest version, the fact that CUDA SDK works on any consumer laptop with NVidia hardware, anyone can slowly get into CUDA, even if their hardware isn't that great.
[+] [-] cedws|9 months ago|reply
[+] [-] rbanffy|9 months ago|reply
OTOH, by emphasizing datacenter hardware, they can cover a relatively small portfolio and maximize access to it via cloud providers.
As much as I'd love to see an entry-level MI350-A workstation, that's not something that will likely happen.
[+] [-] AlexanderDhoore|9 months ago|reply
Which SDKs do they offer that can do neural network inference and/or training? I'm just asking because I looked into this a while ago and felt a bit overwhelmed by the number of options. It feels like AMD is trying many things at the same time, and I’m not sure where they’re going with all of it.
[+] [-] numpad0|9 months ago|reply
[+] [-] user____name|9 months ago|reply
[+] [-] unknown|9 months ago|reply
[deleted]
[+] [-] zombiwoof|9 months ago|reply
[+] [-] Paradigma11|9 months ago|reply
[+] [-] halJordan|9 months ago|reply
AMD deserves exactly zero of the credulity this writer heaps onto them. They just spent four months not supporting their rdna4 lineup in rocm after launch. AMD is functionally capable of day120 support. None of the benchmarks disambiguated where the performance is coming from. 100% they are lying on some level, representing their fp4 performance against fp 8/16.
[+] [-] jchw|9 months ago|reply
It's baffling that AMD is the same company that makes both Ryzen and Radeon, but the year-to-date for Radeon has been very good, aside from the official ROCm support for RDNA4 taking far too long. I wouldn't get overly optimistic; even if AMD finally committed hard to ROCm and Radeon it doesn't mean they'll be able to compete effectively against NVIDIA, but the consumer showing wasn't so bad so far with the 9070 XT and FSR4, so I'm cautiously optimistic they've decided to try to miss some opportunities to miss opportunities. Let's see how long these promises last... Maybe longer than a Threadripper socket, if we're lucky :)
[1]: https://www.phoronix.com/news/AMD-ROCm-H2-2025
[+] [-] pclmulqdq|9 months ago|reply
[+] [-] ethbr1|9 months ago|reply
You mean Ryan Smith of late AnandTech fame?
https://www.anandtech.com/author/85/
[+] [-] zombiwoof|9 months ago|reply
AMD is a marketing company now
[+] [-] kombine|9 months ago|reply
[+] [-] lhl|9 months ago|reply
Note: previous testing I did was on a single (8x) MI300X node, currently I'm doing testing on just a single MI300X GPU, so not quite apples-to-apples, multi-GPU/multi-node training is still a question mark, just a single data point.
[+] [-] fooker|9 months ago|reply
[+] [-] moralestapia|9 months ago|reply
Their MI300s already beat them, 400s coming soon.
[+] [-] aetherspawn|9 months ago|reply
[+] [-] 1ncunabula|9 months ago|reply
[deleted]
[+] [-] mkl|9 months ago|reply
[+] [-] pjmlp|9 months ago|reply