top | item 34460786

(no title)

madlag | 3 years ago

May I add another method: block fine-pruning of transformers (pruning while fine-tuning) ?

https://arxiv.org/abs/2109.04838

Using blocks allows to keep good performence on GPUS, while giving some flexibility in the pruning pattern. And when removing entirely empty rows and columns the pruned matrices are actually pretty dense, so competitive with structured pruning for speedup, but less "aggressive" on the network during the pruning process. Disclaimer: I am the main co-author.

discuss

order

binarymax|3 years ago

This looks super interesting! Thanks for the weekend reading :)