(no title)
madlag | 3 years ago
https://arxiv.org/abs/2109.04838
Using blocks allows to keep good performence on GPUS, while giving some flexibility in the pruning pattern. And when removing entirely empty rows and columns the pruned matrices are actually pretty dense, so competitive with structured pruning for speedup, but less "aggressive" on the network during the pruning process. Disclaimer: I am the main co-author.
binarymax|3 years ago