Universal adversarial perturbations

[+] legatus|9 years ago|reply

Abstract: Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.

[+] kough|9 years ago|reply

Super interesting. I'm on mobile and haven't had time to read the whole paper yet - would it be feasible to continuously compute these perturbation vectors during training and include them as part of a larger heuristic? For instance, to incorporate the objective of maximizing the size of the perturbation vector necessary for misclassification? The goal being to end up with a net that is more resistant to such perturbations.

[+] danbruc|9 years ago|reply

This seems to imply the features lernt by neural networks are very different from the features humans use to distinguish the same objects because they are affected by distortions that do almost not interfere with features used by humans at all.

[+] danieltillett|9 years ago|reply

One thing is neural networks are much smaller than human brains and most likely have far few overlapping redundant systems. If you had three separate neural networks that called on a consensus you might find it much harder to find adversarial inputs.

[+] nabla9|9 years ago|reply

Human vision is not a single snapshot.

We have two 'cameras' and they scan the image they are looking at by jumping around the image 20–200 ms intervals. The perceived image is integration of many of these jumps and its' constantly changing.

[+] thisisdave|9 years ago|reply

Several of the universal perturbation vectors in Figure 4 remind me a lot of Deep Dream's textures.

I wonder what it is about these high-saturation, stripy-spiraly bits that these networks are responding to.

Is it something inherent in natural images? In the training algorithm? In our image compression algorithms? Presumably, the networks would work better if they weren't so hypersensitive to these patterns, so finding a way to dial that down seems like it could be pretty fruitful.

[+] zo7|9 years ago|reply

My intuition is that these patterns "hijack" the ReLU activations in the lower levels, causing either important features to not fire or features that shouldn't fire to do so. Usually the lower layers learn very primitive shapes like lines and curves, and I think (although I'd need to double check) that they usually pass through entire color channels rather than nuanced mixings of colors. (So one features would either pass through all of red or all of blue or all of both, rather than pass just 66% red, 47% blue, and 33% green -- if it did the latter it wouldn't be able to generalize well) This propagates the error through the network, where the later activations start firing in the wrong places, causing the mis-classification.

(This is totally unsubstantiated though)

[+] pfortuny|9 years ago|reply

This is really great research and interesting: (very roughly) how to compute a very small mask which, when applied to any image, makes the neural network misclassify it, whereas humans would notice no essential difference.

Quite remarkable.

[+] hammock|9 years ago|reply

It says these universal vectors are the same across different classifiers. Why would that be?

[+] dkarapetyan|9 years ago|reply

This is why I'm never driving a car that is classifying stuff with neural networks. Some dust, some shitty weather conditions and that pigeon becomes a green light.

[+] asperous|9 years ago|reply

This wouldn't affect that because the perturbations were specially picked to mess up the network. It wouldn't just happen naturally.

Also self-driving cars have distance sensors and wouldn't just drive into upcoming traffic because of one sensor anomaly.

[+] Hydraulix989|9 years ago|reply

This seem(ed) to be comma.ai's approach.

[+] jmount|9 years ago|reply

In signal processing you often have to pass the data through some sort of low-pass filter before attempting your analysis. I would be surprised if that isn't one of the methods being tried to protect deep neural nets from some of these attacks. Obviously there are some issues (needing to train on similar data, and such blurring interfering with first-level features that emulate edge-detection and so on).

[+] nullc|9 years ago|reply

So what happens when you stick this procedure in the training loop? Do you get networks which are robust against doubly-universal perturbations?

[+] dTal|9 years ago|reply

What happens if you include the perturbations in your training data?

[+] dandermotj|9 years ago|reply

If my understanding is correct, the perturbations are inherent in the model, not the data. It's a vulnerability in the high dimensional decision boundary of n nets.

[+] jonathanyc|9 years ago|reply

Reminds me a little bit of the short story BLIT [1], where scientists have accidentally created images that crash the human brain. Cool stuff!

[1]: https://en.wikipedia.org/wiki/BLIT_(short_story)

[+] ccvannorman|9 years ago|reply

"Snowcrash" is the more realistic Neal Stephenson version where it gets at the eye-brain-embedded hardware. And of course the original, "the joke so funny that if read or heard would make you laugh yourself to death".

Humans seem really good at being imprevious to these, due to millions of years of ignoring things..

[+] unknown|9 years ago|reply

[deleted]

[+] amiramir|9 years ago|reply

I'm guessing it won't be long until someone uses this technique to computer and apply perturbation masks to pornographic imagery and make NN-based porn detectors/filters (like the one Yahoo recently open-sourced) a lot less effective.

[+] yodon|9 years ago|reply

Is there reason to think the human visual system is sufficiently well modeled by deep neural nets that our brains might exhibit this same behavior? My first thought was the perturbation images would need to be distinct per person, but photosensitive epilepsy like the Pokémon event [0] might suggest the possibility of shared perturbation vectors.

[0] https://en.m.wikipedia.org/wiki/Photosensitive_epilepsy

[+] nhaliday|9 years ago|reply

What I find interesting is that the labels for the perturbed images aren't completely off in all cases, eg, wool for a shaggy dog.

[+] unknown|9 years ago|reply

[deleted]

[+] javajosh|9 years ago|reply

My science-fiction brain is, of course, interested in this as a method to defeat face-detection in a way humans can't see. I'd like to think that the crew of the Firefly used this technology to avoid detection when they did jobs in the heart of Alliance territory.

[+] oh_sigh|9 years ago|reply

Could you just add noise to any image before passing it through a NN to defeat this kind of attack?

[+] yodon|9 years ago|reply

Can someone help with a notation question? In section 4 of the paper, the norm of the perturbation is constrained to a maximum of 2'000 which presumably is "small" but I don't know how to parse an apostrophe like that

[+] yodon|9 years ago|reply

Update: later in the paper, the authors mention that 2x10^4 is an order of magnitude larger than 2'000 so perhaps this is just a way of introducing a thousands separator without introducing cultural ambiguity over whether it's a thousands separator or a decimal separator?

[+] bmh100|9 years ago|reply

My intuition is that the existence of adversarial images with barely perceptible differences but a high-confidence misclassification will lead to a new NN architecture for image classification.

[+] mathgenius|9 years ago|reply

This is like Godel incompleteness for deep learning.

[+] ktphy|9 years ago|reply

Why?

46 comments