top | item 12824380

Universal adversarial perturbations

108 points| legatus | 9 years ago |arxiv.org | reply

46 comments

order
[+] legatus|9 years ago|reply
Abstract: Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.
[+] kough|9 years ago|reply
Super interesting. I'm on mobile and haven't had time to read the whole paper yet - would it be feasible to continuously compute these perturbation vectors during training and include them as part of a larger heuristic? For instance, to incorporate the objective of maximizing the size of the perturbation vector necessary for misclassification? The goal being to end up with a net that is more resistant to such perturbations.
[+] danbruc|9 years ago|reply
This seems to imply the features lernt by neural networks are very different from the features humans use to distinguish the same objects because they are affected by distortions that do almost not interfere with features used by humans at all.
[+] danieltillett|9 years ago|reply
One thing is neural networks are much smaller than human brains and most likely have far few overlapping redundant systems. If you had three separate neural networks that called on a consensus you might find it much harder to find adversarial inputs.
[+] nabla9|9 years ago|reply
Human vision is not a single snapshot.

We have two 'cameras' and they scan the image they are looking at by jumping around the image 20–200 ms intervals. The perceived image is integration of many of these jumps and its' constantly changing.

[+] thisisdave|9 years ago|reply
Several of the universal perturbation vectors in Figure 4 remind me a lot of Deep Dream's textures.

I wonder what it is about these high-saturation, stripy-spiraly bits that these networks are responding to.

Is it something inherent in natural images? In the training algorithm? In our image compression algorithms? Presumably, the networks would work better if they weren't so hypersensitive to these patterns, so finding a way to dial that down seems like it could be pretty fruitful.

[+] zo7|9 years ago|reply
My intuition is that these patterns "hijack" the ReLU activations in the lower levels, causing either important features to not fire or features that shouldn't fire to do so. Usually the lower layers learn very primitive shapes like lines and curves, and I think (although I'd need to double check) that they usually pass through entire color channels rather than nuanced mixings of colors. (So one features would either pass through all of red or all of blue or all of both, rather than pass just 66% red, 47% blue, and 33% green -- if it did the latter it wouldn't be able to generalize well) This propagates the error through the network, where the later activations start firing in the wrong places, causing the mis-classification.

(This is totally unsubstantiated though)

[+] pfortuny|9 years ago|reply
This is really great research and interesting: (very roughly) how to compute a very small mask which, when applied to any image, makes the neural network misclassify it, whereas humans would notice no essential difference.

Quite remarkable.

[+] hammock|9 years ago|reply
It says these universal vectors are the same across different classifiers. Why would that be?
[+] dkarapetyan|9 years ago|reply
This is why I'm never driving a car that is classifying stuff with neural networks. Some dust, some shitty weather conditions and that pigeon becomes a green light.
[+] asperous|9 years ago|reply
This wouldn't affect that because the perturbations were specially picked to mess up the network. It wouldn't just happen naturally.

Also self-driving cars have distance sensors and wouldn't just drive into upcoming traffic because of one sensor anomaly.

[+] jmount|9 years ago|reply
In signal processing you often have to pass the data through some sort of low-pass filter before attempting your analysis. I would be surprised if that isn't one of the methods being tried to protect deep neural nets from some of these attacks. Obviously there are some issues (needing to train on similar data, and such blurring interfering with first-level features that emulate edge-detection and so on).
[+] nullc|9 years ago|reply
So what happens when you stick this procedure in the training loop? Do you get networks which are robust against doubly-universal perturbations?
[+] dTal|9 years ago|reply
What happens if you include the perturbations in your training data?
[+] dandermotj|9 years ago|reply
If my understanding is correct, the perturbations are inherent in the model, not the data. It's a vulnerability in the high dimensional decision boundary of n nets.
[+] jonathanyc|9 years ago|reply
Reminds me a little bit of the short story BLIT [1], where scientists have accidentally created images that crash the human brain. Cool stuff!

[1]: https://en.wikipedia.org/wiki/BLIT_(short_story)

[+] ccvannorman|9 years ago|reply
"Snowcrash" is the more realistic Neal Stephenson version where it gets at the eye-brain-embedded hardware. And of course the original, "the joke so funny that if read or heard would make you laugh yourself to death".

Humans seem really good at being imprevious to these, due to millions of years of ignoring things..

[+] amiramir|9 years ago|reply
I'm guessing it won't be long until someone uses this technique to computer and apply perturbation masks to pornographic imagery and make NN-based porn detectors/filters (like the one Yahoo recently open-sourced) a lot less effective.
[+] yodon|9 years ago|reply
Is there reason to think the human visual system is sufficiently well modeled by deep neural nets that our brains might exhibit this same behavior? My first thought was the perturbation images would need to be distinct per person, but photosensitive epilepsy like the Pokémon event [0] might suggest the possibility of shared perturbation vectors.

[0] https://en.m.wikipedia.org/wiki/Photosensitive_epilepsy

[+] nhaliday|9 years ago|reply
What I find interesting is that the labels for the perturbed images aren't completely off in all cases, eg, wool for a shaggy dog.
[+] javajosh|9 years ago|reply
My science-fiction brain is, of course, interested in this as a method to defeat face-detection in a way humans can't see. I'd like to think that the crew of the Firefly used this technology to avoid detection when they did jobs in the heart of Alliance territory.
[+] oh_sigh|9 years ago|reply
Could you just add noise to any image before passing it through a NN to defeat this kind of attack?
[+] yodon|9 years ago|reply
Can someone help with a notation question? In section 4 of the paper, the norm of the perturbation is constrained to a maximum of 2'000 which presumably is "small" but I don't know how to parse an apostrophe like that
[+] yodon|9 years ago|reply
Update: later in the paper, the authors mention that 2x10^4 is an order of magnitude larger than 2'000 so perhaps this is just a way of introducing a thousands separator without introducing cultural ambiguity over whether it's a thousands separator or a decimal separator?
[+] bmh100|9 years ago|reply
My intuition is that the existence of adversarial images with barely perceptible differences but a high-confidence misclassification will lead to a new NN architecture for image classification.
[+] mathgenius|9 years ago|reply
This is like Godel incompleteness for deep learning.