top | item 41160462

(no title)

nwoli | 1 year ago

One elegant approach for this I’ve found is this https://github.com/mit-han-lab/gan-compression They basically train an “all in one” network from which you can extract small or large models afterwards (with optional additional finetuning to improve the selected channel size combinations)

discuss

idontknowmuch|1 year ago

Ahh that's an interesting paper I must of missed that one - thanks for the link. I think another paper that recently got a lot of hype has been the Matroyshka representation learning paper -- essentially training models with different parameters and output embedding sizes at the same time, basically distillation during training rather than post-training (https://arxiv.org/abs/2205.13147).