(no title)
dnc | 6 years ago
If a primitive embedding unit is a search query instead of a word, then I assume that a query vector model is trained on a limited dictionary of queries. I wonder if that implies that the trained vector model can encode only search queries that are already present in its dictionary? If not, I think it would be interesting to know more about how the closed dictionary problem was solved.
eggie5|6 years ago
mlthoughts2018|6 years ago
One of the most common “off the shelf” search solutions is to train an embedding for X (where X is whatever you want to search) using some implicit similarity or triplet loss for examples that have natural positive labels (like due to proximity in the word2vec case) plus random negative sampling, and use the embedding in an exact or approximate nearest neighbor index.
In fact, it’s even very “off the shelf” to use Siamese networks for multi-modal embeddings, for example simultaneously learning embeddings that put semantically similar queries and food images into the same ANN vector space.
I think the blog post is very cool don’t get me wrong, but no part of this is novel (if that’s what you were going for, I can’t tell). This exact end to end pipeline has been used for image search, collaborative filtering (customer & product embeddings), various recommender systems where the unit of embedding is something like “pages” or “products” or “blog posts” or other product primitives for whatever type of business, in production across several different companies I’ve worked for.