(no title)
bmh
|
2 years ago
It's interesting that the standard "K" (number of elements with a shared scale) is 32. That seems to imply that the neural network will somehow learn to group weights at those 32-element boundaries.
Does anybody understand how that works? I mean, what is the mechanism that naturally causes the model to group weight scales into those K-element clusters?
buildbot|2 years ago