top | item 39890694

(no title)

- https://www.cs.utexas.edu/users/flame/laff/pfhp/index.html (e.g. here https://www.cs.utexas.edu/users/flame/laff/pfhp/week2-blocki...)

- https://gist.github.com/nadavrot/5b35d44e8ba3dd718e595e40184...

might be of interest

discuss

kpw94|1 year ago

Great links, especially last one referencing the Goto paper:

https://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/...

>> I believe the trick with CPU math kernels is exploiting instruction level parallelism with fewer memory references

It's the collection of tricks to minimize all sort of cache misses (L1, L2, TLB, page miss etc), improve register reuse, leverage SIMD instructions, transpose one of the matrices if it provides better spatial locality, etc.

larodi|1 year ago

The trick is indeed to somehow imagine how the CPU works with the Lx caches and keep as much info in them as possible. So its not only about exploiting fancy instructions, but also thinking in engineering terms. Most of the software written in higher level langs cannot effectively use L1/L2 and thus results in this constant slowing down otherwise similarly (from asymptotic analysis perspective) complexity algos.