top | item 38024758

(no title)

karxxm | 2 years ago

Unfortunately this is not feasible with a large amount of words due to the quadratic scaling. But thanks for the response!

discuss

minimaxir|2 years ago

Not sure what you mean by large amount of words. You can fit a PCA on millions of vectors relatively performantly, then inference from it is just a matmul.

karxxm|2 years ago

Not true. You need a distance matrix (for classical PCA it's a covariance matrix), which scales quadratically with the number of points you want to compare. If you have 1 Mio. vectors, each creating a float entry in the matrix, you will end up with approx (10^6)^2 / 2 unique values, which is roughly 2000Gb of memory.