top | item 41783321

(no title)

xplt | 1 year ago

On point.

Just because the L2-norm yields the same rankings as cosine similarity for the particular case of normalized embeddings when retrieving relevant documents doesn't mean that any other L-norm or commonly used measure in the field of (un)supervised learning or information retrieval presents itself as a viable alternative for the problem at hand — which, by the way, had to be guessed too.

Looking at the "history" of this development (e.g. bag-of-words model, curse of dimensionality etc.) provides a solid explanation for why we've ended up using embeddings and cosine similarity for retrieval.

Though, I'm curious to see any advancements and new approaches in this area. This might sound snarky, but I still commend the author for doing what I wasn't able to by now: writing down their view of the world and putting it out for public scrutiny.

discuss

No comments yet.