(no title)
roflmaostc | 2 months ago
Lol, this will be potentially much slower than using the general matmul kernel.
However, I like this kind of research because it really exploits specific hardware configurations and makes it measurable faster (unlike some theoretical matmul improvements). Code specialization is cheap, and if it saves in the order of a few %, it quickly reimburses its price, especially for important things like matmul.
No comments yet.