top | item 41809099

(no title)

dhruvdh | 1 year ago

And yet Meta is using MI300X exclusively for all live inference on Llama 405B.

Clearly there are workloads AMD wins at, and just going Nvidia by default for everything without considering AMD is suboptimal.

discuss

samspenc|1 year ago

The difference is that Meta and the FAANG companies make hundreds of billions of dollars in annual revenue, and are capable of hiring top talent to solve this problem of their AI running well on any GPU they choose for their data center.

Consumers, open-source solutions and smaller companies unfortunately can't afford this, so they would be dependent fully on AMD and other providers to solve this implementation gap; so ironically smaller companies may prefer to use Nvidia just so they don't have to worry about odd GPU driver issues for their workloads.

ebalit|1 year ago

But Meta is the main company behind Pytorch development. If they make it work and upstream it, this will cascade to all Pytorch users.

We don't have to imagine far, it's slowly happening. Pytorch for ROCm is getting better and better!

Then they will have to fix the split between data-center and consumer GPU for sure. From what I understand, this is on the roadmap with the convergence of both GPU lines on the UDNA architecture.

ErikBjare|1 year ago

If Meta/FAANG can make it work for them, it's not unreasonable to assume those improvements will trickle down to consumers/smaller companies.