Not meant to counter your argument, but at least one compiler out there (GCC) is - in my experience - very good at finding optimizations for x86 but fails most of the time for ARM unless you provide very clear and very strict hints in your code. NEON optimization is one of them. It wouldn't be the first time that GCC completely ignores intrinsics in my loops or (I kid you not) introduces 16-bit Thumb code in my 32 bit code. Very frustrating to constantly have to second-guess your compiler.
No comments yet.