top | item 38542495

(no title)

domschl | 2 years ago

The main thing about this framework is, that it uses unified memory with GPU. This gives maximum performance. Neural engine one the other hand is optimized for low-energy inference (which is mostly an advantage on mobile devices), and imposes limitations and restrictions since it's hardware supports only very specific neural network operations. Thus supporting neural engine within a universal machine learning platform doesn't make much sense, it would just be a bottleneck.

The way to use neural engine is to convert existing models that strictly adhere to the limitations of the neural engine hardware (excluding many operations used in non-restricted NN models) for use in energy-restricted inference applications only. It's a different application scenario.

discuss

PrayagBhakar|2 years ago

Could Transformer based models been converted to work on the NPU?

reaperman|2 years ago

Thank you for all this specific information!