top | item 38541345

(no title)

domschl | 2 years ago

Neural engine is not helpful for training, its inference hardware, whereas this targets training and research. They use Accelerate and Metal (with seemingly similar/identical performance shaders that their Pytorch adaption uses) which allows for high performance training.

This project additionally serves as documentation for other platforms to integrate Silicon, which is good.

discuss

order

fmajid|2 years ago

Still, being to run LLaMa2 on the NPU would be awesome due to the unified memory. Apple's restricting its use to only Apple-approved models is frankly irksome.

domschl|2 years ago

The main thing about this framework is, that it uses unified memory with GPU. This gives maximum performance. Neural engine one the other hand is optimized for low-energy inference (which is mostly an advantage on mobile devices), and imposes limitations and restrictions since it's hardware supports only very specific neural network operations. Thus supporting neural engine within a universal machine learning platform doesn't make much sense, it would just be a bottleneck.

The way to use neural engine is to convert existing models that strictly adhere to the limitations of the neural engine hardware (excluding many operations used in non-restricted NN models) for use in energy-restricted inference applications only. It's a different application scenario.

LeanderK|2 years ago

> Apple's restricting its use to only Apple-approved models is frankly irksome.

I thought you could run arbitrary networks via CoreML, there's just limited precision and maybe not every operation available?