Simple, Fast, and Scalable Reverse Image Search

[+] mxmlnkn|3 years ago|reply

Interesting read. Especially the lookup method based on partitioning.

I tried to implement a similar reverse image search based on dHash as explained here https://github.com/Rayraegah/dhash . However, I also had lookup performance problems. Exact matches are not a problem but the Hamming distance threshold matching is. Because my project was in Python, I tried to eke out more performance by writing a BK-tree backend module in C++ https://github.com/mxmlnkn/cppbktree It was 2 to 10x faster than an existing similar module but still was too slow when trying to look up something in a database of millions of images. However, as lookup tended to depend on the exact Hamming-distance threshold value, my next step would have been to try and optimize the hash. E.g, make it shorter so that only a short Hamming distance is necessary to be looked up but the mentioned multi-indexing method looks much more promising and tested.

[+] starkd|3 years ago|reply

There's limits to how short you can make the perceptual hash. The more you compress it, the more information you lose.

The ML image classification models can be used to extract a good descriptor that can be further reduced into a compact signature.

https://github.com/starkdg/pyphashml

For indexing, I've had some success with distance-based indexing. Here's a comparison of some structures I used:

https://github.com/starkdg/hftrie#comparison

Feel free to contact me, if you want to discuss this further.

[+] ajtulloch|3 years ago|reply

“Fast Search in Hamming Space with Multi-Index Hashing” (https://www.cs.toronto.edu/~norouzi/research/papers/multi_in...) is a great paper. Note that you can do significantly better (in terms of query latency/throughput) with specialized implementations as in eg FAISS (https://github.com/facebookresearch/faiss/wiki/Binary-hashin...).

[+] johndough|3 years ago|reply

There is also https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2... which indexes the 5B images LAION dataset (basically a subset of the images from Common Crawl) and allows both text and image queries. The code as well as the image dataset is open source (although you need a new hard drive just to store the URLs).

[+] fxtentacle|3 years ago|reply

4 requests per image lookup x 2000 requests per second = 8000 DynamoDB Reads per Second = 28,800,000 reads per hour

AWS price list says reserved instances come in at "$0.00013 per RCU" x 28,800,000 = $3744 per hour. Does this really cost $2.6 mio per month to operate?

If yes, contact me and I'll be happy to save you $2.595 mio monthly by re-implementing things in C++ on bare metal servers.

[+] srijs|3 years ago|reply

1x RCU gives you 1 strongly consistent read of up to 4KB _per second_. So 8000 reads per second require 8000 RCUs, which at list pricing comes to $1 per hour. And that's assuming reads do need to be strongly consistent (otherwise half the price) and no discounts are applied.

[+] tiffanyh|3 years ago|reply

Just charge them the cost of 1-months savings.

Seems like a fair amount for them to spend for your efficiency gains and quick ROI for them :)

[+] jonatron|3 years ago|reply

I used ORB descriptors and Hamming Distance for a fuzzy/partial/inexact reverse image search at my last job. While it didn't scale like the OP does, it found some really interesting fuzzy matches.

[+] tiffanyh|3 years ago|reply

https://tineye.com is pretty great for doing internet wide reverse image searches.

[+] hbn|3 years ago|reply

It's still nowhere close to how good Google's reverse image search was several years back. Ever since they handicapped that, there just hasn't been as good of a way to do reverse image searches. I've tried on searching for images I know are on the internet, running them through Google, TinEye, Bing, Yandex, to no avail.

It seems like a lot of the focus has moved from "find this exact image" to "find similar images," like you give it a picture of a Husky and it'll find more pictures of Huskies.

[+] Corendos|3 years ago|reply

This was my internship subject (in another company) just before I graduated, I wonder what they used for the Perceptual Hash, ours was SIFT features. Happy to see that what I implemented would have been able to scale that much !

[+] HanClinto|3 years ago|reply

So when you run SIFT on an image, one gets dozens (maybe hundreds) of SIFT features back. The trouble with SIFT features is that each individual SIFT feature is a local image descriptor -- it describes a single point in the image. One can't just append the two lists of SIFT descriptors together and do a Hamming comparison on them, because it's not guaranteed that both images will have all of the same SIFT descriptors, nor that they would be in the same order. When you want to do image comparison on image descriptors, one must compare every local feature with every local feature in every other image. This is great for comparing two images together, or for finding where one image is located in another image (homography matching), but this does not scale for large image sets.

In contrast, descriptors like perceptual hashes look at the entire image, and so are a _global_ image descriptor.

There are ways to convert local SIFT image descriptors into a single global image descriptor for doing more rapid lookup (Bag of Visual Words is one technique that comes to mind), but SIFT and pHash really are in two categories all their own.

More info on pHash: https://hackerfactor.com/blog/index.php%3F/archives/432-Look...

Example of SIFT for fine-grained image matching: https://docs.opencv.org/3.4/d1/de0/tutorial_py_feature_homog...

[+] starkd|3 years ago|reply

I have found the ML image categorization models an excellent method of extracting a unique descriptor. It is possible to compress the image for matching and storage into a compact signature.

I did it here: https://github.com/starkdg/phashml

https://github.com/starkdg/pyphashml

It is available in a python module that uses tensorflow model.

Feel free to message me.

[+] jcbages|3 years ago|reply

This looks to cool I didn't know anything about perceptual hashing but the idea makes a lot of sense. I'm curious if the system lose its effectiveness if a user shifts an image a few pixels, reflects it, or apply a certain filter that makes all pixels have a slightly different tonality. I'm also thinking of some system maybe a small ML program that applies a minimal amount of noise to the image such that to the human eye it looks the same but pixel-wise is totally different.

[+] culi|3 years ago|reply

95% of the time when I'm using reverse image search I'm just trying to find the original source of something. This is the opposite of what Google Lens focuses on but is a much simpler problem to solve :(

[+] fxtentacle|3 years ago|reply

I would love to hear about the cost of this deployment.

10 billion images x4 rows each in Amazon DynamoDB with 2000 rps sounds like it'll burn through your wallet faster than most explosives...

[+] swyx|3 years ago|reply

something i felt was missing from the piece - what is the typical hamming distance cutoff for something like this? he explains 2 is small enough to be identical but presumably it can go really high. what happens with a distance of like 100? what's the state of research on false positives and adversarial attacks?

[+] starkd|3 years ago|reply

It's tricky. You have to customize it. But any index method needs a fairly low threshold or the search devolves to a linear search. But as you increase the threshold, you quickly reach a cutoff where it's all just completely dissimilar images. So, a threshold of 100 would be utterly pointless.

31 comments