In ML everything is a tradeoff. The article strongly suggests using dot product similarity and it's a great metric in some situations, but dot product similarity has some issues too:
- not normalized (unlike cosine simularity)
- heavily favors large vectors
- unbounded output
- ...Basically, do not carelessly use any similarity metric.
microtonal|1 year ago
(The catch is that during training logistic regression is done on the word and context vectors, but they have a high degree of similarity. People would even sum the context vectors and word vectors or train with word and context vectors being the same vectors without much loss.)