top | item 42896624

(no title)

jokowueu | 1 year ago

How much are NPUs more efficient than GPUs ? What are the limitations , it seems it will have support for deepseek R1 soon

discuss

tamlin|1 year ago

A decent chunk of AI computation is the ability to do matrix multiplication fast. Part of that is reducing the amount of data transferred to and from the matrix multiplication hardware on the NPU and GPU; memory bandwidth is a significant bottleneck. The article is highlighting 4-bit format use.

GPUs are an evolving target. New GPUs have tensor cores and support all kinds of interesting numeric formats, older GPUs don't support any of the formats that AI workloads are using today (e.g. BF16, int4, all the various smaller FP types).

NPU will be more efficient because it is much less general an GPU and doesn't have any gates for graphics. However, it is also fairly restricted. Cloud hardware is orders of magnitude faster (due to much higher compute resources I/O bandwidth), e.g. https://cloud.google.com/tpu/docs/v6e.

justincormack|1 year ago

NPU also has no more memory bandwidth than CPU, but then the GPU on these machines doesnt either.