top | item 28397192

(no title)

Doing some form of PQ at train time is possible but typically the goal is to make the model’s embedding layers more robust to quantisation [1][2]. I did some work on this in the recommender systems space [3].

[1] https://arxiv.org/abs/1807.04629

[2] http://proceedings.mlr.press/v119/chen20l/chen20l.pdf

[3] http://ceur-ws.org/Vol-2431/paper10.pdf

discuss

youssefabdelm|4 years ago

Really impressive and intriguing work, thanks for sharing!

I'd be specifically curious about applying PQ to transformers. It's quite depressing to me that the ultra-large-scale model training is inaccessible to the average poor person like me. The dream would be to figure out some method of compressing the parameter count significantly (or rather make models more efficient) such that it is possible to train and/or run 100 billion/1/10/100trillion+ parameter models on Colab say, as 'crazy' as that sounds to most probably.