top | item 18869827

(no title)

mlucy | 7 years ago

A word embedding transforms a word into a series of numbers, with the property that similar words (e.g. "dog" and "canine") produce similar numbers.

You can have embeddings for other things, such as pictures, where you would want the property that e.g. two pictures of dogs produce more similar numbers than a picture of a dog and a picture of a cat.

discuss

order

perfmode|7 years ago

Ah. Sounds like a vector space. How does one select a basis?

leereeves|7 years ago

It is indeed a vector space. You don't really choose a basis, an ML tool like word2vec [1] does. And like most advanced applications of ML, exactly how it works is a mystery.

1: https://en.wikipedia.org/wiki/Word2vec

> The reasons for successful word embedding learning in the word2vec framework are poorly understood. Goldberg and Levy point out that the word2vec objective function causes words that occur in similar contexts to have similar embeddings (as measured by cosine similarity) and note that this is in line with J. R. Firth's distributional hypothesis. However, they note that this explanation is "very hand-wavy" and argue that a more formal explanation would be preferable.