edoliberty's comments

edoliberty | 4 years ago | on: Introduction to Locality-Sensitive Hashing

My 2c, the choice of algorithm is quite complex... It is data dependent and also depends on your read/write ratio, latency and accuracy needs, memory/cpu, and many other factors. If I were you I'd look for a versatile solution more than "the best" algorithm cos that choice will likely change... just sayin'...

edoliberty | 4 years ago | on: Introduction to Locality-Sensitive Hashing

Hi All, it's true that LSH (locality sensitive hashing) and DR (dimension reduction) are very related. They both derive their correctness from the same strong concentration phenomenon.

But, they are used/designed for different use cases. DR is mostly concerned with preserving distances such that

c * D( f(x), f(y) ) < d(x,y) < C*D( f(x), f(y) )

Here, f(x) is the mapping of x to a lower dimensional space (or any other object, really). d(-,-) and D(-,-) are the original distance measure and new distance measure respectively. The smaller the ratio C/c the better. Clearly, the computational overhead of applying f and D are also important.

LSH, on the other hand, is mostly concerned with probabilistically filtering and searching for nearby points efficiently. Here are some (old) class notes from a course I gave 2013https://edoliberty.github.io//classnotes/datamining2013/pdfs...

While LSH is mathematically beautiful, it is no longer considered the gold standard for ANN. There are many open source algorithms that outperform LSH on most datasets.

(I founded a vector search company: https://www.pinecone.io)