top | item 34460786

(no title)

madlag | 3 years ago

May I add another method: block fine-pruning of transformers (pruning while fine-tuning) ?

Using blocks allows to keep good performence on GPUS, while giving some flexibility in the pruning pattern. And when removing entirely empty rows and columns the pruned matrices are actually pretty dense, so competitive with structured pruning for speedup, but less "aggressive" on the network during the pruning process. Disclaimer: I am the main co-author.

discuss

binarymax|3 years ago

This looks super interesting! Thanks for the weekend reading :)