top | item 40973658

(no title)

verditelabs | 1 year ago

The NPU is a Hexagon DSP with HVX, Hexagon Vector Extensions, and HMX, Hexagon Matrix Extensions. The core ISA and vector ISAs are pretty well documented and supported by upstream llvm, but AFAIK HMX is not publicly documented. Core ISA + HVX by itself can probably get you to 1/3 to 1/2 of so of the theoretical peak TOPS for Hexagon. It's been a bit since I've run code on device, but all the support code is in their SDK, and it's easy as pie to get it running on the simulator.

QCOM have said that up to ~13B LLMs will run at reasonable, and I think that's a pretty good peak for ~40TOPs and ~150GB/s bandwidth.

discuss

order

No comments yet.