top | item 47173290

Making PyTorch –> Qualcomm NPUs less treacherous

1 points| olokobayusuf | 4 days ago |muna.ai

1 comment

There are over 2.5 billion Qualcomm processors in the world today (PC, mobile, automotive, etc). But the process for bringing AI models to run on Qcom processors is a (massive) pain. Their 2GB+ SDK is an encyclopedia's worth of information needed to deploy correctly.

We're working to make Qualcomm NPUs a first-class citizen for deployment from PyTorch. Devs can write a Python function that runs a PyTorch model, then use our `@compile` decorator to transpile the model to a Qcom-specific C++ implementation (DLC) which compiles to a self-contained shared library.

The Qualcomm NPUs are fast. 1.8x faster than ONNXRuntime. See the link above.