top | item 42390688

(no title)

chessgecko | 1 year ago

I remember reading that it’s too hard to get good memory bandwidth/l2 utilization in the fancy algorithms, you need to read contiguous blocks and be able to use them repeatedly. But I also haven’t looked at the gpu blas implementations directly.

discuss

order

No comments yet.