Interesting read. Especially the lookup method based on partitioning.
I tried to implement a similar reverse image search based on dHash as explained here https://github.com/Rayraegah/dhash . However, I also had lookup performance problems. Exact matches are not a problem but the Hamming distance threshold matching is. Because my project was in Python, I tried to eke out more performance by writing a BK-tree backend module in C++ https://github.com/mxmlnkn/cppbktree It was 2 to 10x faster than an existing similar module but still was too slow when trying to look up something in a database of millions of images. However, as lookup tended to depend on the exact Hamming-distance threshold value, my next step would have been to try and optimize the hash. E.g, make it shorter so that only a short Hamming distance is necessary to be looked up but the mentioned multi-indexing method looks much more promising and tested.
There is also https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2... which indexes the 5B images LAION dataset (basically a subset of the images from Common Crawl) and allows both text and image queries. The code as well as the image dataset is open source (although you need a new hard drive just to store the URLs).
4 requests per image lookup x 2000 requests per second = 8000 DynamoDB Reads per Second = 28,800,000 reads per hour
AWS price list says reserved instances come in at "$0.00013 per RCU" x 28,800,000 = $3744 per hour. Does this really cost $2.6 mio per month to operate?
If yes, contact me and I'll be happy to save you $2.595 mio monthly by re-implementing things in C++ on bare metal servers.
1x RCU gives you 1 strongly consistent read of up to 4KB _per second_. So 8000 reads per second require 8000 RCUs, which at list pricing comes to $1 per hour. And that's assuming reads do need to be strongly consistent (otherwise half the price) and no discounts are applied.
I used ORB descriptors and Hamming Distance for a fuzzy/partial/inexact reverse image search at my last job. While it didn't scale like the OP does, it found some really interesting fuzzy matches.
It's still nowhere close to how good Google's reverse image search was several years back. Ever since they handicapped that, there just hasn't been as good of a way to do reverse image searches. I've tried on searching for images I know are on the internet, running them through Google, TinEye, Bing, Yandex, to no avail.
It seems like a lot of the focus has moved from "find this exact image" to "find similar images," like you give it a picture of a Husky and it'll find more pictures of Huskies.
This was my internship subject (in another company) just before I graduated, I wonder what they used for the Perceptual Hash, ours was SIFT features. Happy to see that what I implemented would have been able to scale that much !
So when you run SIFT on an image, one gets dozens (maybe hundreds) of SIFT features back. The trouble with SIFT features is that each individual SIFT feature is a local image descriptor -- it describes a single point in the image. One can't just append the two lists of SIFT descriptors together and do a Hamming comparison on them, because it's not guaranteed that both images will have all of the same SIFT descriptors, nor that they would be in the same order.
When you want to do image comparison on image descriptors, one must compare every local feature with every local feature in every other image. This is great for comparing two images together, or for finding where one image is located in another image (homography matching), but this does not scale for large image sets.
In contrast, descriptors like perceptual hashes look at the entire image, and so are a _global_ image descriptor.
There are ways to convert local SIFT image descriptors into a single global image descriptor for doing more rapid lookup (Bag of Visual Words is one technique that comes to mind), but SIFT and pHash really are in two categories all their own.
I have found the ML image categorization models an excellent method of extracting a unique descriptor. It is possible to compress the image for matching and storage into a compact signature.
This looks to cool I didn't know anything about perceptual hashing but the idea makes a lot of sense. I'm curious if the system lose its effectiveness if a user shifts an image a few pixels, reflects it, or apply a certain filter that makes all pixels have a slightly different tonality. I'm also thinking of some system maybe a small ML program that applies a minimal amount of noise to the image such that to the human eye it looks the same but pixel-wise is totally different.
95% of the time when I'm using reverse image search I'm just trying to find the original source of something. This is the opposite of what Google Lens focuses on but is a much simpler problem to solve :(
something i felt was missing from the piece - what is the typical hamming distance cutoff for something like this? he explains 2 is small enough to be identical but presumably it can go really high. what happens with a distance of like 100? what's the state of research on false positives and adversarial attacks?
It's tricky. You have to customize it. But any index method needs a fairly low threshold or the search devolves to a linear search. But as you increase the threshold, you quickly reach a cutoff where it's all just completely dissimilar images. So, a threshold of 100 would be utterly pointless.
[+] [-] mxmlnkn|3 years ago|reply
I tried to implement a similar reverse image search based on dHash as explained here https://github.com/Rayraegah/dhash . However, I also had lookup performance problems. Exact matches are not a problem but the Hamming distance threshold matching is. Because my project was in Python, I tried to eke out more performance by writing a BK-tree backend module in C++ https://github.com/mxmlnkn/cppbktree It was 2 to 10x faster than an existing similar module but still was too slow when trying to look up something in a database of millions of images. However, as lookup tended to depend on the exact Hamming-distance threshold value, my next step would have been to try and optimize the hash. E.g, make it shorter so that only a short Hamming distance is necessary to be looked up but the mentioned multi-indexing method looks much more promising and tested.
[+] [-] starkd|3 years ago|reply
The ML image classification models can be used to extract a good descriptor that can be further reduced into a compact signature.
https://github.com/starkdg/pyphashml
For indexing, I've had some success with distance-based indexing. Here's a comparison of some structures I used:
https://github.com/starkdg/hftrie#comparison
Feel free to contact me, if you want to discuss this further.
[+] [-] ajtulloch|3 years ago|reply
[+] [-] johndough|3 years ago|reply
[+] [-] fxtentacle|3 years ago|reply
AWS price list says reserved instances come in at "$0.00013 per RCU" x 28,800,000 = $3744 per hour. Does this really cost $2.6 mio per month to operate?
If yes, contact me and I'll be happy to save you $2.595 mio monthly by re-implementing things in C++ on bare metal servers.
[+] [-] srijs|3 years ago|reply
[+] [-] tiffanyh|3 years ago|reply
Seems like a fair amount for them to spend for your efficiency gains and quick ROI for them :)
[+] [-] jonatron|3 years ago|reply
[+] [-] tiffanyh|3 years ago|reply
[+] [-] hbn|3 years ago|reply
It seems like a lot of the focus has moved from "find this exact image" to "find similar images," like you give it a picture of a Husky and it'll find more pictures of Huskies.
[+] [-] Corendos|3 years ago|reply
[+] [-] HanClinto|3 years ago|reply
In contrast, descriptors like perceptual hashes look at the entire image, and so are a _global_ image descriptor.
There are ways to convert local SIFT image descriptors into a single global image descriptor for doing more rapid lookup (Bag of Visual Words is one technique that comes to mind), but SIFT and pHash really are in two categories all their own.
More info on pHash: https://hackerfactor.com/blog/index.php%3F/archives/432-Look...
Example of SIFT for fine-grained image matching: https://docs.opencv.org/3.4/d1/de0/tutorial_py_feature_homog...
[+] [-] starkd|3 years ago|reply
I did it here: https://github.com/starkdg/phashml
https://github.com/starkdg/pyphashml
It is available in a python module that uses tensorflow model.
Feel free to message me.
[+] [-] jcbages|3 years ago|reply
[+] [-] culi|3 years ago|reply
[+] [-] fxtentacle|3 years ago|reply
10 billion images x4 rows each in Amazon DynamoDB with 2000 rps sounds like it'll burn through your wallet faster than most explosives...
[+] [-] swyx|3 years ago|reply
[+] [-] starkd|3 years ago|reply