top | item 23119804

(no title)

walrus | 5 years ago

I'm just speculating (and haven't read the paper yet), but it may be possible to achieve similar speedups on GPUs by pruning the smallest 20% of blocks of size ≥K×K to produce block-sparse weights[0], rather than pruning the smallest 20% of weights.

[0] https://openai.com/blog/block-sparse-gpu-kernels/

discuss

No comments yet.