top | item 28229015

(no title)

> an adversary could trick Apple’s algorithm into erroneously matching an existing image

This is a very real, possible attack. Apple ships its CSAM model on device so any attacker can have a copy of the model. Then the attacker creates an image that triggers CSAM but looks like a panda [1]. Now the attacker sends tons of triggering photos to the unsuspecting victim, who now gets questioned by the FBI.

1: https://medium.com/@ml.at.berkeley/tricking-neural-networks-...

discuss

shadowfacts|4 years ago

> Now the attacker sends tons of triggering photos to the unsuspecting victim, who now gets questioned by the FBI.

That's glossing over the middle part where a human from Apple (before it even gets to law enforcement) actually look at the images and goes "oh, these are actually pandas" and realizes they were erroneously detected.

carom|4 years ago

So the attacker creates an image then the user has to download it. Then the FBI digs in and see it was a crafted false positive, then begin to investigate who sent it and why. Then the user takes civil action against the person who sent it for harassment.

simondotau|4 years ago

More precisely, 30 carefully crafted false positives. All of which need to be imported into your iPhone's photo library to sit alongside pictures of your dog and your mum. And then they have to get past human review. Not impossible, but so far beyond implausible that it can be dismissed as ridiculous.

And if this trick ever works, it could only be done once before Apple has the opportunity to plug holes in their NeuralHash algorithm and fix any deficiencies in the manual review process.