top | item 35291115

(no title)

>> that the change in the weights as you train also have low intrinsic rank

It seems that the initial matrix of weights has a low rank approximation A and this implies that the difference E = W - A is small, also it seems that PCA fails when E is sparse because PCA is designed to be optimum when the error is gaussian.

discuss

stu2b50|2 years ago

In terms of PCA, PCA is also quite expensive computationally. Additionally, you'd probably have to do SVD instead.

Since the weights are derived from gradient descent, yeah we don't really know what the distributions would be.

A random projection empirically works quite well for very high dimensions, and is of course very cheap computationally.