top | item 36854897

(no title)

ggerganov | 2 years ago

> I don't recall the details exactly, but I don't think it ever did very much.

How would you have known if the trick actually reduces the outliers in the weights? Even if the transformer quality does not improve overall, having less outliers as a result is very beneficial for more accurate quantization of the data

discuss

order

danielmarkbruce|2 years ago

Are you asking "why would you have bothered to look at"?

The "how" is pretty straightforward.

p1esk|2 years ago

He's questioning the statement: "I don't think [the trick] ever did very much", because no one has yet looked at whether the trick helps reducing outliers in very large models. If it does help with this, as the blog author believes, then it is indeed a very useful trick.

ggerganov|2 years ago

Yes, I assumed that checking the weights for presence and amount of outliers is not something that is usually done and effects on this can be overlooked. If my assumption is wrong and researchers do usually look at such metrics, then my question is not very relevant.

Agree - the "how" is straightforward