top | item 40168406

Optimizing your programs for Arm platforms

26 points| Phyx | 1 year ago |community.arm.com

29 comments

order

astrange|1 year ago

This isn't a good article. I would say that if you're trying to rely on `restrict` and autovectorization you're doomed and should write it yourself. Even if it works on one compiler version, it won't work on all of them.

(It could possibly work in a language that isn't C and is designed for it; Fortran or shader programs are easier to autovectorize, and something like ISPC starts out "vectorized" and gets "autoscalarized".)

This is why ffmpeg writes SIMD in assembly and is more successful than all the people constantly replying "um actually you never need to write anything in assembly" to them.

BoingBoomTschak|1 year ago

The big problem is that gcc/clang don't seem to have a concept of optimization notices, like SBCL does.

Nobody is more appropriate than the compiler to warn you that it couldn't optimize something costly and why.

ColonelPhantom|1 year ago

Aren't shader programs more like ISPC (or OpenCL/CUDA), in that the programming model is based around 'pretend each SIMD lane is thread'?

SubjectToChange|1 year ago

Writing good assembly is a niche skill, especially SIMD assembly. Projects like ffmpeg are able to do it because they're pulling from a massive pool of contributors. In general writing raw assembly should be avoided unless you're genuinely in a position of knowing better.

...people constantly replying "um actually you never need to write anything in assembly" to them.

Honestly, who is saying that?

dzaima|1 year ago

Any modern compiler that bothers should be able to autovectorize most practical vectorizable things without issue, even without restrict. Of course there'll be some small inefficiencies or failed autovectorization sometimes, but small and big missed optimizations are in no way at all a problem unique to vectorization, so it's a moot point here.