Can you comment on experience (or contact me) regarding implementation efficiency? We have recently implemented task-based parallelism in the J language with openMP[0]. Improvements or critiques are appreciated. SIMD instructions there have been coded directly rather than via pragmas.[0] https://www.monument.ai/m/parallel
dragontamer|4 years ago
I'm looking at the benchmarks I used to look at, and they're all from 2014 or earlier. So maybe I really should double-check modern implementations. We all know GCC 4.x and LLVM 3.x are an eternity ago, so I probably should revisit their performance.
For example: https://www.phoronix.com/scan.php?page=article&item=llvm_cla...
And back then, it was pretty well known that OpenMP implementations were slower than commercial (such as Intel ICC or IBM's OpenMP implementation).