edoliberty | 4 years ago | on: Introduction to Locality-Sensitive Hashing
edoliberty's comments
edoliberty | 4 years ago | on: Introduction to Locality-Sensitive Hashing
But, they are used/designed for different use cases. DR is mostly concerned with preserving distances such that
c * D( f(x), f(y) ) < d(x,y) < C*D( f(x), f(y) )
Here, f(x) is the mapping of x to a lower dimensional space (or any other object, really). d(-,-) and D(-,-) are the original distance measure and new distance measure respectively. The smaller the ratio C/c the better. Clearly, the computational overhead of applying f and D are also important.
LSH, on the other hand, is mostly concerned with probabilistically filtering and searching for nearby points efficiently. Here are some (old) class notes from a course I gave 2013https://edoliberty.github.io//classnotes/datamining2013/pdfs...
While LSH is mathematically beautiful, it is no longer considered the gold standard for ANN. There are many open source algorithms that outperform LSH on most datasets.
(I founded a vector search company: https://www.pinecone.io)