(no title)
theresistor | 1 year ago
It's also often about offload. Depending on the use case, the CPU and GPU may be busy with other tasks, so the NPU is free bandwidth that can be used without stealing from the others. Consider AI-powered photo filters: the GPU is probably busy rendering the preview, and the CPU is busy drawing UI and handling user inputs.
cakoose|1 year ago
Without those, wouldn't it be better to use the NPUs silicon budget on more CPU?
mapt|1 year ago
In this environment it makes some sense to use more efficient RISC cores, and to spread out cores a bit with dedicated bits that either aren't going to get used all the time, or that are going to be used at lower power draws, and combining cores with better on-die memory availability (extreme L2/L3 caches) and other features. Apple even has some silicon in the power section left as empty space for thermal reasons.
Emily (formerly Anthony) on LTT had a piece on the Apple CPUs that pointed out some of the inherent advantages of the big-chip ARM SOC versus the x86 motherboard-daughterboard arrangement as we start to hit Moore's Wall. https://www.youtube.com/watch?v=LFQ3LkVF5sM
theresistor|1 year ago
heavyset_go|1 year ago
avianlyric|1 year ago
NPUs focus on one specific type of computation, matrix multiplication, and usually with low precision integers, because that’s all a neural net needs. That vast reduction in flexibility means you can take lots of shortcuts in your design, allowing you cram more compute into a smaller footprint.
If you look at the M1 chip[1], you can see the entire 16-Neural engine has a foot print about the size of 4 performance cores (excluding their caches). It’s not perfect comparison, without numbers on what the performance core can achieve in terms of ops/second vs the Neural Engine. But it seems reasonable to be that the Neural Engine and handily outperform the performance core complex when doing matmul operations.
[1] https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...