The task-based parallelism in LLVM leaves much to be desired however. Ideally, you'd want a more efficient implementation.
But yeah, good enough to play with. But maybe not good enough to achieve high levels of performance. The SIMD stuff is probably simple enough to implement... maybe I should checkout how well LLVM works with OMP SIMD keywords.
Can you comment on experience (or contact me) regarding implementation efficiency? We have recently implemented task-based parallelism in the J language with openMP[0]. Improvements or critiques are appreciated. SIMD instructions there have been coded directly rather than via pragmas.
In LLVM or in libomp?
I don't know what omp simd is likely to get you over autovectorization. I know of cases where it was thought necessary (-fopenmp-simd, without -fopenmp) but wasn't with recent GCC.
dragontamer|4 years ago
But yeah, good enough to play with. But maybe not good enough to achieve high levels of performance. The SIMD stuff is probably simple enough to implement... maybe I should checkout how well LLVM works with OMP SIMD keywords.
jpf0|4 years ago
[0] https://www.monument.ai/m/parallel
gnufx|4 years ago