top | item 35291437

(no title)

tylerekahn | 2 years ago

It’s actually as low as 0.01% of the original weights.

From the LoRa paper:

>When the pre-trained model is GPT-3 175B, the number of train- able parameters |Θ| can be as small as 0.01% of |Φ0|.

discuss

No comments yet.