(no title)
Firadeoclus | 3 years ago
It turns out that even without the extra training iterations you often lose surprisingly little in terms of quality of output. In reality you can sparsify a lot more, but 2 out of 4 is so simple and easy to implement in hardware, more complex schemes are much harder to justify.
However, small matmuls (say, <2048 bytes in the K dimension) won't get anywhere near 2x performance.
No comments yet.