top | item 34273192

(no title)

What you might do is train using dense matrices, then sparsify those (pick the 2 out of each set of 4 weights that are closest to zero, mask them out), then do a few more training iterations with the mask in place.

It turns out that even without the extra training iterations you often lose surprisingly little in terms of quality of output. In reality you can sparsify a lot more, but 2 out of 4 is so simple and easy to implement in hardware, more complex schemes are much harder to justify.

However, small matmuls (say, <2048 bytes in the K dimension) won't get anywhere near 2x performance.

discuss

No comments yet.