top | item 36683167

(no title)

rivo | 2 years ago

And if you just want to go by the pixel data, look into "perceptual hashing". https://github.com/rivo/duplo works quite well for me, even when dealing with watermarks or slight colour correction / sharpening. You could even go further and improve your success rate with Neural Hash or something similar.

discuss

eviks|2 years ago

is there an option to just calculate image hash (but on image data, not the full file of image data + metadata) without any transforms? So that if it matches you can be 100% certain it's the same image

mceachen|2 years ago

Unfortunately, (almost all!) image hashing don't detect color differences--they map images to greyscale first. This may be fine for many situations, but it will return the same result for a sepia tint, a full color original with incorrect white balance, and the final result you made after mucking with channels for a couple minutes.

I also found that there really isn't one "best" image hash algorithm. Using _several different_ image hash algos turns out to be only fractionally more expensive during both compute and query times, and substantially improves both precision and recall. I'm using a mean hash, gradient diff, and a DCT, all rendered from all three CIELAB-based layers, so they're sensitive to both brightness and color differences.

rivo|2 years ago

The library I posted uses colour information. It won't map to greyscale first.