top | item 38761259

(no title)

bkfh | 2 years ago

Can someone ELI5 this?

discuss

A vector is a position in a dimensional space. In 2D space a vector is a point (x, y) like (1, 3) or (-2.5, 7.39). We can also do simple math on vectors like addition: (1, 3) + (2, -1) = (3, 2).

LLMs treat language as combinations of vectors of a very high dimension -- (x, y, z, a, b, c, d, ...). The neat thing is that we can combine these just like the 2D vectors and get meaningful results. If we have the vectors for the concepts "King" and "Woman", adding them gives a vector close to the one for "Queen"!

Once you know this, you can extrapolate and look for ways to categorize groups of vectors and combine them in new ways. As I read it, this research is about finding the vector weights for text from specific time periods -- i.e. January of 2021 -- and comparing them to the vectors for text from a different period -- i.e. March of 2021. It seems that all the operations are still meaningful, you can even do something like averaging vectors in January and March and getting ones that look like vectors in February!