(no title)
homerowilson | 4 years ago
git clone https://github.com/xianyi/OpenBLAS && cd OpenBLAS && make PREFIX=/opt/openblas install && curl https://jott.live/code/blas_test.cc | sed -n "/<code>/,/code>/p" | tail -n +2 | head -n -1 > blas_test.cpp
inspect blas_test.cpp file, and then...
g++ -I/opt/openblas/include/ blas_test.cc -lopenblas -std=c++11 -O3 -L/opt/openblas/lib/ -o blas_test && ./blas_test 512 512 512 100 100
and got a peak of about 192 gflops, averaging closer to 180. So yeah, the M1 is > 6x faster in this simple single-precision matrix test.
matja|4 years ago
kitestramuort|4 years ago
g++ -I/opt/intel/mkl/include/ blas_test.cc -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -std=c++11 -O3 -march=native -L/opt/intel/mkl/lib/intel64 -o blas_test_mkl
brrrrrm|4 years ago
https://jott.live/raw/blas_test.cc