top | item 40154275

(no title)

For the larger performance diffs, has anyone looked into why? Are there a couple of common reasons? I'd really like to know. Thanks

discuss

sarah-ek|1 year ago

i have, yes. i can't speak for openblas or mkl, but im familiar with eigen and nalgebra's implementations to some extent

nalgebra doesn't use blocking, so decompositions are handled one column (or row) at a time. this is great for small matrices, but scales poorly for larger ones

eigen uses blocking for most decompositions, other than the eigendecomposition, but they don't have a proper threading framework. the only operation that is properly multithreaded is matrix multiplication using openmp (and the unstable tensor module using a custom thread pool)