top | item 28229291

Show HN: Neural-hash-collider – Find target hash collisions for NeuralHash

623 points| anishathalye | 4 years ago |github.com | reply

351 comments

order
[+] anishathalye|4 years ago|reply
The README (https://github.com/anishathalye/neural-hash-collider#how-it-...) explains in a bit more detail how these adversarial attacks work. This code pretty much implements a standard adversarial attack against NeuralHash. One slightly interesting part was replacing the thresholding with a differentiable approximation. I figured I'd share this here in case anyone is interested in seeing what the code to generate adversarial examples looks like; I don't think anyone in the big thread on the topic (https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...) has shared attack code for NeuralHash in particular yet.
[+] IncRnd|4 years ago|reply
Nicely done. Thank you for sharing.

I'd like to share the following paper for anyone else who may be interested. It is about watermarking rather than a preimage attack.

"Adversarial Embedding: A robust and elusive Steganography and Watermarking technique" https://arxiv.org/abs/1912.01487

Unfortunately, the existence of invisible watermarking demonstrates a separate attack on the hash. Instead of a preimage attack, this might be able to change the hash of an image that is suspected of already being a match. A true-positive would be changed into a false-negative.

[+] dang|4 years ago|reply
Ongoing related threads:

Apple defends anti-child abuse imagery tech after claims of ‘hash collisions’ - https://news.ycombinator.com/item?id=28225706 - Aug 2021 (401 comments)

Hash collision in Apple NeuralHash model - https://news.ycombinator.com/item?id=28219068 - Aug 2021 (662 comments)

Convert Apple NeuralHash model for CSAM Detection to ONNX - https://news.ycombinator.com/item?id=28218391 - Aug 2021 (177 comments)

(I just mean related to this particular project. To list the threads related to larger topic would be...too much.)

[+] vermilingua|4 years ago|reply
The integrity of this entire system now relies on the security of the CSAM hash database, which has just dramatically increased in value to potential attackers.

All it would take now, is for one CSAM hash to be known to the public, then uploading collided iPhone wallpapers to wallpaper download sites. That many false positives will overload whatever administrative capacity there is to review reports in a matter of days.

[+] Matheus28|4 years ago|reply
There's no need for someone to get the entire CSAM database. If they go on the darknet and just find enough images (or hashes) that would trip Apple's system, that would be enough. I'd assume any publicly available image on the darknet would likely also be on CSAM.
[+] ec109685|4 years ago|reply
No, there’s another private hash function that also has to match the known CSAM image for an image to be considered a match.

That one can’t be figured out through this technique.

[+] simondotau|4 years ago|reply
Before they make it to human review, photos in decrypted vouchers have to pass the CSAM match against a second classifier that Apple keeps to itself.
[+] Fnoord|4 years ago|reply
To assume CP is reviewed manually is simply wrong. You don't want to put such weight on an individual. You want to automate it as much as possible, with as little false positives (and false negatives) as possible.

For example, in case of a wallpaper, let's say its the Windows XP wallpaper. There's no human skin color in it at all, so you can easily be reasonably sure it isn't CP. You would not need an advanced ML for such.

And they can have multiple checksums, just like a tarball or package or whatever can have an CRC32, MD5, and SHA512. Just because one of these matches, doesn't mean the other don't. Only problem is keeping these DBs of hashes secret. But that could very well be a reason the scanning isn't done locally.

[+] romeovs|4 years ago|reply
A lot has been said about using this as an attack vector by possibly poisoning a victims iPhone with an image that matches a CSAM hash.

But could this not also be used to circumvent the CSAM scanning by converting images that are in the CSAM database to visually similar images that won't match the hash anymore? That would effectively defeat the CSAM scanning Apple and others are trying to put into place completely and render the system moot.

One could argue that these spoofed images could also be added to the CSAM database, but what if you spoof them to have hashes of extremely common images (like common memes)? Adding memes to the database would render the whole scheme unmanageable, no?

Or am I missing something here?

So we'd end up with a system that: 1. Can't be reliably used to track actual criminal offenders (they'd just be able to hide) without rendering the whole database useless. 2. Can be used to attack anyone by making it look like they have criminal content on their iPhones.

[+] chongli|4 years ago|reply
Or am I missing something here?

Wouldn't it be easier for offenders to avoid Apple products? That requires no special computer expertise and involves no risk on their part.

[+] argvargc|4 years ago|reply
What we're describing at this point is effectively the same as a system of automatically flagging users as potential criminals based on something as manipulable as a filename.
[+] xucheng|4 years ago|reply
In addition to the attacks, such as converting legit image to be detected as CSAM (false positive) or circumventing detection of the real CSAM image (false negative), which have been widely discussed in HN, I think this can also be used to mount a DOS attack or to censor any images.

It works like this. First, found your target images, which are either widely available like internet memes for DOS attack or images you want to censor. Then, compute their Neuralhash. Next, use the hash collision tool to turn real CSAM images to have the same NeuralHash as the target images. Finally, report these adversarial CSAM images to the government. The result is that the attackers would successfully add the targeted NeuralHash into the CSAM database. And people who store these legit image will then be flagged.

[+] iaw|4 years ago|reply
Really naive question. What's to stop apple from using two distinct and separate visual hashing algorithms? Wouldn't the collision likelihood decrease drastically in that scenario?

Again, really naive but it seems like if you have two distinct multi-dimensional hashes it would be much harder to solve the gradient descent problem.

[+] 63|4 years ago|reply
I'm fairly sure they do, actually. It was in one of the articles earlier today that Apple has a distinct, secret algorithm they perform on suspected CSAM server side after it gets flagged by the client side neural hash. Then only after 30 such images from a single user are identified as CSAM by both algorithms will they be sent to a human reviewer who will confirm their contents. Then, finally, law enforcement will be alerted.

There has been a lot of hyperbole going around and the original premise that this is a breach of privacy is still true, but in my opinion the actual repercussions of attacks and collisions are being grossly exaggerated. One would have to create a collision with known CSAM for both algorithms (one of which is secret) which also overlaps with a legal porn image that could be misconstrued as CSAM by a human reviewer, or at the very least create and distribute hundreds of double collisions to DOS the reviewers.

[+] zepto|4 years ago|reply
They do. The system isn’t vulnerable to these collisions attacks. The people saying they are are just not aware of how the system works.
[+] heavyset_go|4 years ago|reply
It's common for two unrelated images come up as false positives when comparing hashes across different unrelated perceptual hashing methods.
[+] yoz-y|4 years ago|reply
To me the most interesting findings from this fiasco were:

1. People actually do use generally publicly available services to store and distribute CP (as suggested by the amount of reports done by Facebook)

2. A lot of people evidently use iCloud Photo Library to store images of things other than pictures they took themselves. This is not really surprising, I've learned that the answer of "does anybody ever?" questions is aways "yes". It is a bit weird though since the photo library is terrible for this use case.

[+] ThinBold|4 years ago|reply
Despite that Apple scanning our images is a horrible privacy practice, I don't get why 𝚜̶𝚘̶ ̶𝚖̶𝚊̶𝚗̶𝚢̶ some people think this is an ineffective idea.

Surely you can easily fabricate innocent images whose NeuralHash matches the database. But in what way are you going to send them to victims and convince them to save them to their photo library? The moment you send it via WhatsApp FB will stop you because (they think) it is a problematic image. And Even if the image did land, it has to look like some cats and dogs or the receiver will just ignore. (Even worse, the receiver may report you.) And even if your image does look like cats and dogs, it has to pass another automatic test at the server side that uses another obfuscated, constantly-updating algorithm. After that, even more tests if Apple really wants to.

That means your image needs to collide ≥ three times, one open, one obfuscated, and one Turing.

Gmail scans your attachments and most people are cool with it. I highly doubt that Apple has any reason to withdraw this.

[+] Alupis|4 years ago|reply
So, what does Apple get out of all this, except negative attention, erosion of their image, possible privacy lawsuits, etc?

I just don't understand what Apple's motivation would have been here. Surely this fallout could have been anticipated?

[+] spullara|4 years ago|reply
Apple is scanning files locally before they are uploaded to iCloud in order to avoid storing unencrypted photos within iCloud but still discovering CSAM. All the other storage providers already scan all the images uploaded on their servers. I guess you can decide which is better. Here is Google's report on it:

https://transparencyreport.google.com/child-sexual-abuse-mat...

[+] jjcon|4 years ago|reply
> in order to avoid storing unencrypted photos within iCloud

To be clear, Apple does not utilize E2E in iCloud. They can (and already do) scan iCloud contents

[+] ncw96|4 years ago|reply
Apple has said this is not the final version of the hashing algorithm they will be using: https://www.vice.com/en/article/wx5yzq/apple-defends-its-ant...
[+] only_as_i_fall|4 years ago|reply
Does it matter? Unless they're going to totally change the technology I don't see how they can do anything but buy time until it's reverse engineered. After all, the code runs locally.

If Apple wants to defend this they should try to explain how the system will work even if generating adversarial images is trivial.

[+] Scaevolus|4 years ago|reply
NeuralHash collisions are interesting, but the way Apple is implementing their scanner it's impossible to extract the banned hashes directly from the local database.

There are other ways to guess what the hashes are, but I can't think of legal ones.

> Matching-Database Setup. The system begins by setting up the matching database using the known CSAM image hashes provided by NCMEC and other child-safety organizations. First, Apple receives the NeuralHashes corresponding to known CSAM from the above child-safety organizations. Next, these NeuralHashes go through a series of transformations that includes a final blinding step, powered by elliptic curve cryptography. The blinding is done using a server-side blinding secret, known only to Apple. The blinded CSAM hashes are placed in a hash table, where the position in the hash table is purely a function of the NeuralHash of the CSAM image. This blinded database is securely stored on users’ devices. The properties of elliptic curve cryptography ensure that no device can infer anything about the underlying CSAM image hashes from the blinded database.

https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...

[+] throwaway384950|4 years ago|reply
Some people seem to be confused why a hash collision of a cat and a dog matters. Here's a potential attack: share (legal) NSFW pictures that are engineered to have a hash collision with CSAM to get someone else in trouble. The pictures are flagged as CSAM, and they also look suspicious to a human reviewer (maybe not enough context in the image to identify the subject's age). To show that this can be done with real NSFW pictures, here is an example, using an NSFW image from a subreddit's top posts of all time.

Here is the image (NSFW!): https://i.ibb.co/Ct64Cnt/nsfw.png

Hash: 59a34eabe31910abfb06f308

[+] robertoandred|4 years ago|reply
Does anyone save porn to their personal photo libraries? Especially porn as suspicious as the image you posted?
[+] ec109685|4 years ago|reply
Keep in mind that you have to also collide with another perceptual hash function that only Apple has to trigger a match.
[+] tandav|4 years ago|reply
Does Google Chrome scans downloaded images ?
[+] cirrus3|4 years ago|reply
You seem to be assuming a human cannot tell the difference from some random NSFW content, and some legit known CSAM, 30 times. Try again.
[+] ipiz0618|4 years ago|reply
Great work, I hope people keep hacking the system to lower the system's credibility. This idea is just beyond insane, and the plan to have manual check on user's photos on their own devices sounds like what China is doing - not great
[+] jchw|4 years ago|reply
I am strongly against Apple’s decision to do on-device CSAM detection, but: wasn’t there a secondary hash whose database is not shared? In theory you need to collide with both to truly defeat the design, right?
[+] onepunchedman|4 years ago|reply
This is just getting wilder and wilder by the day, how spectacularly this move has backfired. As others have commented, at this point all you need is someone willing to sell you the CSAM hashes on the darknet, and this system is transparently broken.

Until that day, just send known CSAM to any person you'd like to get in trouble (make sure they have icloud sync enabled), be it your neighbour or a political figure, and start a PR campaign accusing the person of being investigated for it. The whole concept is so inherently flawed it's crazy they haven't been sued yet.

[+] dannyw|4 years ago|reply
The "send known CSAM" attack has existed for a while but never made sense. However, this technology enables a new class of attacks: "send legal porn, collided to match CSAM perceptual hashes".

With the previous status quo:

1. The attacker faces charges of possessing and distributing child pornography

2. The victim may be investigated and charged with child pornography if LEO is somehow alerted (which requires work, and can be traced to the attacker).

Poor risk/reward payoff, specifically the risk outweighs the reward. So it doesn't happen (often).

---

With the new status quo of lossy, on-device CSAM scanning and automated LEO alerting:

1. The attacker never sends CSAM, only material that collides with CSAM hashes. They will be looking at charges of CFAA, extortion, and blackmail.

2. The victim will be automatically investigated by law enforcement, due to Apple's "Safety Voucher" system. The victim will be investigated for possessing child pornography, particularly if the attacker collides legal pornography that may fool a reviewer inspecting a 'visual derivative'.

Great risk/reward payoff. The reward dramatically outweighs the risk, as you can get someone in trouble for CSAM without ever touching CSAM yourself.

If you think ransomware is bad, just imagine CSAM-collision ransomware. Your files will be replaced* with legal pornography that is designed specifically to collide with CSAM hashes and result in automated alerting to law enforcement. Pay X monero within the next 30 minutes, or quite literally, you may go to jail, and be charged with possessing child pornography, until you spend $XXX,XXX on lawyers and expert testimony that demonstrates your innocence.

* Another delivery mechanism for this is simply sending collided photos over WhatsApp, as WhatsApp allows for up to 30 media images in one message, and has settings that will automatically add these images to your iCloud photo library.

[+] shuckles|4 years ago|reply
Why wait? Just send them the pictures on Facebook Messenger or Gmail or Dropbox today.
[+] cookiengineer|4 years ago|reply
Imagine being a parent that made pictures of their own children that bathed naked in their own backyard.

I don't know about you, but my parents certainly have lots of embarassing pictures of me in their photo album.

There will be so many false positives in that system, it's ridiculous. It doesn't necessarily have to be a false colliding hash, but legitimate use cases that - by definition - are impossible to train neural nets on unless the data is being used illegally by Apple.

[+] robertoandred|4 years ago|reply
Why would anyone save CSAM to their photo library?
[+] ryanmarsh|4 years ago|reply
Ok so now all we have to do is get a phone, load it with adversarial images that have hashes from the CSAM database and we wait and see what happens. Basically a honeypot. Get some top civil rights attorneys involved. Take the case to the Supreme Court. Get precedence set right.

Lawfare

[+] ec109685|4 years ago|reply
The adversarial images have to match both the NeuralHash output of CSAM, plus another private perceptual hash that points to the same image that only Apple has access to, plus a human reviewer needs to agree it is CSAM, and this has to happen for 30 images.
[+] robertoandred|4 years ago|reply
Where would you get the CSAM hashes?
[+] Banditoz|4 years ago|reply
That was scary fast. Is there a point in using this algorithm for its intended purpose now?
[+] dannyw|4 years ago|reply
If the intended purpose is to lead by example and eventually mandate code on every computing device (phone and computer) that scans all files against a government provided database, then yes, that purpose still exists and this algorithm still works for it.

Just wait and watch - I guarantee you that Apple will be talking about CSAM in at least one anti-trust legal battle about why they shouldn't be broken up. Because a walled garden means they can oppress citizens on behalf of governments better.

[+] zepto|4 years ago|reply
Yes, because isn’t a weakness in the design. There is nothing scary fast about it. It was obvious and anticipated in the threat model.
[+] endisneigh|4 years ago|reply
Why does matter? The photo looks nothing like the target?

If someone looks at the two images wouldn’t they see they’re not the same and therefore the original image was mistakenly linked with the target

[+] dannyw|4 years ago|reply
Apple's reviewers, by law, cannot look at the target. No one except NCMEC is allowed to possess the target (CSAM material).

So Apple will be looking at a low-res grayscale image of whatever the collided image is, which could be legal adult pornography (let's say: a screengrab of legal "teen" 18+ porn), but the CSAM filter tells it that it's abuse material!

What would you do as the Apple reviewer?

(Hint: You only have one option, as you are legally mandated to report).

[+] Alupis|4 years ago|reply
You could pollute the pool and overwhelm their human review process, making it untenable to operate. And that's if you just wanted to pollute it with obvious non-CSAM content.