top | item 30853188

(no title)

servytor | 3 years ago

If you are interested in the M1 neural engine, I highly recommend you check out this[0].

[0]: https://github.com/geohot/tinygrad/tree/master/accel/ane

discuss

rickdeveloper|3 years ago

He live streamed himself writing a lot of that:

https://www.youtube.com/watch?v=mwmke957ki4

https://www.youtube.com/watch?v=H6ZpMMDvB1M

https://www.youtube.com/watch?v=JAyw7OAcXDE

https://www.youtube.com/watch?v=Cb2KwcnDKrk

erwincoumans|3 years ago

Yes, George Hotz (geohot) reverse engineered the neural engine and could make it work for tinygrad, the videos posted in the other reply describe the reverse engineering process.

I wonder why Apple didn't provide low-level API's to access the hardware? It may have various restrictions. I recall Apple also didn't provide proper API's to access OpenCL frameworks on iOS, but some people found workarounds to access that as well. Maybe they only integrate with a few limited but important use cases, TensorFlow, Adobe that they can control.

Could it be that using the ANE in the wrong way overheats the M1?

aseipp|3 years ago

Because machine learning accelerators are, in the broadest sense, not "done" and rapidly evolving every year. Exposing too many details of the underlying architecture is a prime way to ossify your design, making it impossible to change, and as a result you will fall behind. It is possible the Neural Engine of 2022 will look very different to the one of 2025, as far as the specifics of the design, opcode set, etc all go.

One of the earliest lessons along this line was Itanium. Itanium exposing so much of the underlying architecture as a binary format and binary ABI made evolution of the design extremely difficult later on, even if you could have magically solved all the compiler problems back in 2000. Most machine learning accelerators are some combination of a VLIW and/or systolic array design. Most VLIW designers have learned that exposing the raw instruction pipeline to your users is a bad idea not because it's impossibly difficult to use (compilers do in fact keep getting better), but because it makes change impossible later on. This is also why we got rid of delay slots in scalar ISAs, by the way; yes they are annoying but they also expose too much of the implementation pipeline, which is the much bigger issue.

Many machine learning companies take similar approaches where you can only use high-level frameworks like Tensorflow to interact with the accelerator. This isn't something from Apple's playbook, it's common sense once you begin to design these things. In the case of Other Corporations, there's also the benefit that it helps keep competitors away from their design secrets, but mostly it's for the same reason: exposing too much of the implementation details makes evolution and support extremely difficult.

It sounds crass but my bet is that if Apple exposed the internal details of the ANE and later changed it (which they will, 100% it is not "done") the only "outcome" would be a bunch of rageposting on internet forums like this one. Something like: "DAE Apple mothershitting STUPID for breaking backwards compatibility? This choice has caused US TO SUFFER, all because of their BAD ENGINEERING! If I was responsible I would have already open sourced macOS and designed 10 completely open source ML accelerators and named them all 'Linus "Freakin Epic" Torvalds #1-10' where you could program them directly with 1s and 0s and have backwards compatibility for 500 years, but people are SHEEP and so apple doesn't LET US!" This will be posted by a bunch of people who compiled "Hello world" for it one time six months ago and then are mad it doesn't "work" anymore on a computer they do not yet own.

> Could it be that using the ANE in the wrong way overheats the M1?

No.

fredoralive|3 years ago

Possibly just to avoid having programs that rely too much on specific implementation details of the current engine causing issues in the future if they decide to change the hardware design? An obvious comparison is graphics cards where you don't get low level access to the GPU[1], so they can change architecture details across generations.

Using a high level API probably makes it easier to implement a software version for hardware that doesn't have the neural engine, like Intel Macs or older A-cores.

[1] Although this probably starts a long conversation about various GPU and ML core APIs and quite how low level they get.

mhh__|3 years ago

Apple don't want to let people get used to the internals and spiritually like to enforce a very clear us versus them philosophy when it comes to their new toys. They open source things they want other people to standardize around but if it's their new toy then its usually closed.

xenadu02|3 years ago

CoreML is the API to use the ANE.

exikyut|3 years ago

The likeliest reason is long-term ABI ossification.

irae|3 years ago

All the sibling comments are better guesses, but I would also guess there could be security implications on exposing lower level access. Having it all proprietary and undocumented is itself a way of making it harder to exploit. Albeit, as mentioned, not having to settle ABI is way more likely the primary reason.

WithinReason|3 years ago

A high level API needs much less support effort.