top | item 30973806

(no title)

dsharlet | 3 years ago

If you are looking for something like this in C++, here's my attempt at implementing it: https://github.com/dsharlet/array#einstein-reductions

It doesn't do any automatic optimization of the loops like some of the projects linked in this thread, but, it provides all the tools needed for humans to express the code in a way that a good compiler can turn it into really good code.

discuss

order

andi999|3 years ago

How is "This is 40-50x faster than a naive C implementation of nested loops on my machine, " that possible/Do you know how this was achieved?

mynameismon|3 years ago

Not OP, but from a cursory glance at the code, it seems to be achieved with the combination of splitting the matrices into chunks to fit them in the L1/L2 caches (line 2 in the code), using tricks like switching indexes to achieve better cache locality, and using SIMD + Fused Multiply Add to further speedup things