(no title)
ggerganov | 2 years ago
How would you have known if the trick actually reduces the outliers in the weights? Even if the transformer quality does not improve overall, having less outliers as a result is very beneficial for more accurate quantization of the data
danielmarkbruce|2 years ago
The "how" is pretty straightforward.
p1esk|2 years ago
ggerganov|2 years ago
Agree - the "how" is straightforward