top | item 40872100

(no title)

sbstp | 1 year ago

Maybe -march=native gives it an edge as it compiles for this exact CPU model whereas numpy is compiled for a more generic (older) x86-64. -march=native would probably get v4 on a Ryzen CPU where numpy is probably targeting v1 or v2.

https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...

discuss

order

stingraycharles|1 year ago

Doesn’t numpy have runtime SIMD dispatching and whatnot based on CPU flags?

E.g. https://github.com/numpy/numpy/blob/main/numpy/_core/src/com...

KeplerBoy|1 year ago

np.matmul just uses whatever blas library your NumPy distribution was configured for/shipped with.

Could be MKL (i believe the conda version comes with it) but it could also be an ancient version of OpenBLAS you already had installed. So yeah, being faster than np.matmul probably just means your NumPy is not installed optimally.