top | item 39914660

(no title)

TheRealKing | 1 year ago

No, you can still trust compilers: 1) The hand-tuned BLAS routines are essentially a different algorithm with hard-coded information. 2) The default OpenBLAS uses OpenMP parallelism, so much speed likely originates from multithreading. Set OMP_NUM_THREADS environment variable to 1 before running your benchmarks. You will still see a significant performance difference due to a few factors, such as extra hard-coded information in OpenBLAS implementation.

discuss

order

marshallward|1 year ago

I ran with OMP_NUM_THREADS=1, but your point is well taken.

As for the original post, I felt a bit embarrassed about my original comments, but I think the compilers actually did fairly well based on what they were given, which I think is what you are saying in your first part.