top | item 45463995

(no title)

edude03 | 5 months ago

> I wonder if it will spur nvidia to work on an inference only accelerator.

Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.

discuss

CharlesW|5 months ago

> Arguably that's a GPU?

Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...

"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."

edude03|5 months ago

Yeah, I'm probably splitting hairs here but as far as I understand (and honestly maybe I don't understand) - Rubin CPX is "just" a normal GPU with GDDR instead of HBM.

In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.

So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.

imtringued|5 months ago

The AMD NPU has more than 2x the performance per watt versus basically any Nvidia GPU. Nvidia isn't leading because they are power efficient.

And no, the NPU isn't a GPU.

edude03|5 months ago

Maybe a better way to make my point - the GPU is nvidias golden goose egg and it's good enough that they may go down with the ship. For example (illustrative numbers) - if it costs nvidia $100 to make a GPU they can sell to gamers for $2000, researchers for $5000 and enterprise for $15,000, would it make sense for them to start from scratch and invest billions to make something that's today an unknown amount better and that would only be interesting to the $15,000 market they've already cornered? (Yes, I'm assuming there are more gamers than people who want to run a local LLM)

AzN1337c0d3r|5 months ago

I would submit Google's TPUs are not GPUs.

Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.

nsteel|5 months ago

Isn't Etched's Soho ASIC claimed to be much better than a GPU?

https://www.etched.com/announcing-etched

quadrature|5 months ago

I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.

conradev|5 months ago

They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added