In AI, that doesn't sound too surprising to me right now.
I just experiment with some local LLMs, but the differences are pretty huge:
Llama 3 8B, Raspberry Pi 5: 2-3 Tokens/second (but it works!)
Llama 3 8B, RTX 4080: ~60 Tokens/second
Llama 3 8B, groq.com LPU, ~1300 Tokens/second
Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second
Llama 3 70B, groq.com LPU, ~330 Tokens/second
There seem to be huge gaps between CPU, GPU and specialized inference ASICs. I'm guessing that right now there aren't many genius-level architecture breakthroughs, and that it's more about how much memory and silicon real estate you're willing to dedicate to AI inference.
I think there is a close limit considering most of these gains are coming from the reduced memory bandwidth consumption that comes with the smaller data types. This would line up with Nvidia’s crazy graph from yesterday where data types were specified.
How much lower can these go though? 2bit? 1.58bit? 1bit? It seems that these massive gains have a very hard stop to gains that AMD and Nvidia will use to raise their stock price before it all comes to a sudden end.
Such a weird & cruel modernity, where these releases are purely in the abstract. No, you still won't be able to buy a MI300X in Q4 2024. The enhanced edition will absolutely not be available.
(I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting)).
I think that's where short-sighted financial gain leads AMD to. Where's the money? -- datacenter. So let's focus the good stuff on datacenter exclusivelly. What about "the rest" (gamers, hobbist, students)? There's no money there, let's give theme crap RDNA that we make sure can't be used for any real work; just pretent we're catering for their needs.
I think their "consumer GPU" did so bad recently that AMD could just as well, you know, simply liquidate the "consumer GPU" division and stop pretending.
I'm in the "consumer GPU" market myself; what AMD GPU do I buy today? -- Radeon Pro VII, launched in 2020 and the best AMD consumer GPU I can find today.
It's such a divide. I could optimize my software for such powerful GPUs as the Mi300 line.. but why do that, given that probably I won't even see one such GPU in my lifetime.
shrubble|1 year ago
Netcob|1 year ago
I just experiment with some local LLMs, but the differences are pretty huge:
Llama 3 8B, Raspberry Pi 5: 2-3 Tokens/second (but it works!)
Llama 3 8B, RTX 4080: ~60 Tokens/second
Llama 3 8B, groq.com LPU, ~1300 Tokens/second
Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second
Llama 3 70B, groq.com LPU, ~330 Tokens/second
There seem to be huge gaps between CPU, GPU and specialized inference ASICs. I'm guessing that right now there aren't many genius-level architecture breakthroughs, and that it's more about how much memory and silicon real estate you're willing to dedicate to AI inference.
wmf|1 year ago
ipsum2|1 year ago
Havoc|1 year ago
Even for marketing claims that’s pretty wild.
Still lots of trajectory left in just scale up plan it seems
layoric|1 year ago
How much lower can these go though? 2bit? 1.58bit? 1bit? It seems that these massive gains have a very hard stop to gains that AMD and Nvidia will use to raise their stock price before it all comes to a sudden end.
jauntywundrkind|1 year ago
(I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting)).
latchkey|1 year ago
mpreda|1 year ago
I think their "consumer GPU" did so bad recently that AMD could just as well, you know, simply liquidate the "consumer GPU" division and stop pretending.
I'm in the "consumer GPU" market myself; what AMD GPU do I buy today? -- Radeon Pro VII, launched in 2020 and the best AMD consumer GPU I can find today.
It's such a divide. I could optimize my software for such powerful GPUs as the Mi300 line.. but why do that, given that probably I won't even see one such GPU in my lifetime.
re-thc|1 year ago
Paper launches aren't anything new. It's always been a thing especially in hardware.
forrestthewoods|1 year ago
Why not? Because they’re sold out to hyperscalers?
almostgotcaught|1 year ago
they're 15k - who exactly is disappointed they won't be able to buy one?
unknown|1 year ago
[deleted]
nabla9|1 year ago
DrNosferatu|1 year ago
AMD still has to prove themselves in this.
unknown|1 year ago
[deleted]