top | item 38917810

(no title)

Has anyone tried the same adversarial examples against many different DNNs? I would think these are fairly brittle attacks in reality and only effective with some amount of inside knowledge.

discuss

dijksterhuis|2 years ago

Yes. It is possible to generate one adversarial example that defats multiple machine learning models -- this is the transferability property.

Making examples that transfer between multiple models can affect "perceptibility" i.e. how much of change/delta/perturbation is required to make the example work.

But this is highly dependent on the model domain. Speech to text transferability is MUCH harder than image classification transferability, requiring significantly greater changes and decreased transfer accuracy.

I'm pretty sure there were some transferable attacks generated in a black box threat model. But I might be wrong on that and cba to search through arxiv right now.

edit: https://youtu.be/jD3L6HiH4ls?feature=shared&t=1779

whoami_nr|2 years ago

Author here. Some of them are black box attacks (like the one where they get the training data out of the model) and it was done on Amazon cloud classifier which big companies regularly use. So, I wouldn’t say that these attacks are entirely impractical and purely a research endeavour.