Not sure what you mean by large amount of words. You can fit a PCA on millions of vectors relatively performantly, then inference from it is just a matmul.
Not true. You need a distance matrix (for classical PCA it's a covariance matrix), which scales quadratically with the number of points you want to compare.
If you have 1 Mio. vectors, each creating a float entry in the matrix, you will end up with approx (10^6)^2 / 2 unique values, which is roughly 2000Gb of memory.
minimaxir|2 years ago
karxxm|2 years ago