top | item 45463764

(no title)

quadrature | 4 months ago

Not suprising that the hyperscalers will make this decision for inference and maybe even a large chunk of training. I wonder if it will spur nvidia to work on an inference only accelerator.

discuss

order

edude03|4 months ago

> I wonder if it will spur nvidia to work on an inference only accelerator.

Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.

CharlesW|4 months ago

> Arguably that's a GPU?

Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...

"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."

imtringued|4 months ago

The AMD NPU has more than 2x the performance per watt versus basically any Nvidia GPU. Nvidia isn't leading because they are power efficient.

And no, the NPU isn't a GPU.

AzN1337c0d3r|4 months ago

I would submit Google's TPUs are not GPUs.

Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.

quadrature|4 months ago

I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.

conradev|4 months ago

They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added