(no title)
almostdigital | 2 years ago
Python 4.216 GFLOPS
Naive: 6.400 GFLOPS 1.52x faster than Python
Vectorized: 22.232 GFLOPS 5.27x faster than Python
Parallelized: 52.591 GFLOPS 12.47x faster than Python
Tiled: 60.888 GFLOPS 14.44x faster than Python
Unrolled: 62.514 GFLOPS 14.83x faster than Python
Accumulated: 506.209 GFLOPS 120.07x faster than Python
microtonal|2 years ago
gyrovagueGeist|2 years ago
andy99|2 years ago
* For context I do have done some experience experimenting on the gcc/intel compiler options that are available for linear algebra, and even outside of BLAS, compiling with -o3 -ffast-math -funroll-loops etc does a lot of that, and for simple loops as in matrix vector multiplication, compilers can easily vectorize. I'm very curious if there is something I don't know about that will result in a speedup. See e.g. https://gist.github.com/rbitr/3b86154f78a0f0832e8bd171615236... for some basic playing around
almostdigital|2 years ago