top | item 40657355

(no title)

ribit | 1 year ago

Most NPUs are not directly end-user programmable. The vendor usually provides a custom SDK that allows you to run models created with popular frameworks on their NPUs. Apple is a good example since they have been doing it for a while. They provide a framework called CoreML and tools for converting ML models from frameworks such as PyTorch into a proprietary format that CoreML can work with.

The main reason for this lack of direct programmability is that NPUs are fast-evolving, optimized technology. Hiding the low-level interface allows the designer to change the hardware implementation without affecting end-user software. For example, some NPUs can only work with specific data formats or layer types. Early NPUs were very simple convolution engines based on DSPs; newer designs also have built-in support for common activation functions, normalization, and quantization.

Maybe one day, these things will mature enough to have a standard programming interface. I am skeptical about this becoming a reality any time soon. Some companies (like Tenstorrent) are specifically working on open architectures that will be directly programmable, I'm not sure whether their approach translates to the embedded NPUs, though. What would be nice is an open graph-based API and a model format for specifying and encoding ML models.

discuss

No comments yet.