top | item 43082476

(no title)

ganyu | 1 year ago

Bear in mind that "any two high-dimensional vectors are almost always orthogonal".

discuss

frizkie|1 year ago

Is this better rephrased as “any two vectors in a high-dimensional space are almost always functionally orthogonal”?

I have mostly a laypersons understanding of this idea but I would assume that it would be false to say that they are typically _entirely_ orthogonal?

aithrowawaycomm|1 year ago

Yes, one more precise way to phrase this is that the expected value of the dot product between two random vectors chosen from a vector space tends towards 0 as the dimension tends to infinity (I think the scaling is 1/sqrt(dimension)). But the probability of drawing two truly orthogonal vectors at random (over the reals) is zero - the dot product will be very small but nonzero.

That said, for sparse high dimensional datasets, which aren't proper vector spaces, the probability of being truly orthogonal can be quite high - e.g. if half your vectors have totally disjoint support from the other half then the probability is at least 50-50.

Note that ML/LLM practioners use "approximate orthogonality" anyway.

viraptor|1 year ago

https://softwaredoug.com/blog/2022/12/26/surpries-at-hi-dime... it's both much more likely to be actually orthogonal and almost always very close to orthogonal.

esafak|1 year ago

The visualization is useless. IF the 2D embeddings were any good they might be useful to R1's developers but still not to end users. What am I supposed to with it?

higuidebot|1 year ago

No need to do anything in particular! Perhaps interesting to observe

TaurenHunter|1 year ago

So the trick is to pick the dimensions that are relevant and discard the rest when calculating the distance.

dehrmann|1 year ago

Alternatively, in a high-dimension space, everyone sits in their own corner.