Per the original paper, empirically it’s been found that neural network weights often have low intrinsic rank. It follows, then, that the change in the weights as you train also have low intrinsic rank, which means that you should be able represent them with a lower rank matrix.
Edited: By the way, it seems to me that there is an error in the wikipedia page because if the Low-rank approximation takes a larger rank then the bound of the error should decrease, and in this page the error increases.
>> that the change in the weights as you train also have low intrinsic rank
It seems that the initial matrix of weights has a low rank approximation A and this implies that the difference E = W - A is small, also it seems that PCA fails when E is sparse because PCA is designed to be optimum when the error is gaussian.
stu2b50|2 years ago
grph123dot|2 years ago
(1) https://en.wikipedia.org/wiki/Low-rank_approximation
Edited: By the way, it seems to me that there is an error in the wikipedia page because if the Low-rank approximation takes a larger rank then the bound of the error should decrease, and in this page the error increases.
grph123dot|2 years ago
It seems that the initial matrix of weights has a low rank approximation A and this implies that the difference E = W - A is small, also it seems that PCA fails when E is sparse because PCA is designed to be optimum when the error is gaussian.
seydor|2 years ago