top | item 22461836

CNN-generated images are surprisingly easy to spot for now

196 points| hardmaru | 6 years ago |peterwang512.github.io | reply

98 comments

order
[+] calibwam|6 years ago|reply
Off topic: Always define your abbreviations. To find out what CNN stands for here, you either have to read a comment thread on HN, or go to the paper and read the introduction. The linked page doesn't even mention neural networks. And as some other commenter here has mentioned, CNN has other more well known meanings than Convolutional Neural Networks.
[+] alias_neo|6 years ago|reply
It was drilled into us in university (engineering) that you spell out abbreviations and acronyms on first use, no matter how well known you think it is.

Some cases I've seen lately seem to forgo this not out of ignorance but as a form of eletism/knowledge gate keeping.

[+] bilekas|6 years ago|reply
Genuinely thought this was some reference to something on the topic of 'fake news' Abbreviations are great if you're using the term multiple time. Not as an intro.
[+] usmannk|6 years ago|reply
This is a paper that was published in CVPR (Conference on Computer Vision and Pattern Recognition). In that context it is unambiguous that CNN means Convolutional Neural Networks.
[+] OJFord|6 years ago|reply
It seems fair enough in the title to me, but not spelling it out on first use in the abstract is poor IMO.

Edit: Although, I see it does in first use in the introduction, so maybe that's just conforming to whoever's style guide.

[+] kick|6 years ago|reply
The character limit is 80. Many abbreviations on the front-page are almost certainly caused by the low character limit making it difficult to express concepts that don't have a singular word for them.
[+] nl|6 years ago|reply
This github page is intended as an appendix to the paper[1] which does define it:

"However, these methods represent only two instances of a broader set of techniques: image synthesis via convolutional neural networks (CNNs)."

[1] https://arxiv.org/pdf/1912.11035.pdf

[+] SkyBelow|6 years ago|reply
I think it depends upon the audience and the work being written. Anything on the level of a news article definitely should spell it out, but a forum post about some game can get away with the common acronyms used by the community. It is part of knowing your audience.
[+] mbostleman|6 years ago|reply
Yes, with an abbreviation like CNN it is remarkably presumptuous to not define it in this article. I followed the headline specifically because it was in the title and I assumed it referred to the news network.
[+] ropiwqefjnpoa|6 years ago|reply
Seriously, what does CNN mean for most people, even on HN.

Would this be receiving as much attention if they had used "Convolutional Neural Networks" instead of just CNN?

[+] mikorym|6 years ago|reply
I thought this was about the news channel.
[+] dillonmckay|6 years ago|reply
What does HN abbreviate?
[+] Grimm1|6 years ago|reply
I work a decent bit with ML and even with that I'm just waking up and read that as the news Network so apparently pretty easy to confuse
[+] currymj|6 years ago|reply
it’s a scientific paper from a computer vision conference, it would be absurd in that context to assume anyone reading it doesn’t know that it stands for convolutional neural network. they didn’t write this with Hacker News in mind.
[+] danmg|6 years ago|reply
It's not ambiguous which CNN they meant from the rest of the words in the title.
[+] amelius|6 years ago|reply
Also, the convolution part is only a speedup thing. You can do very similar neural network operations without the convolution, except that everything will be much slower and you'd need a lot more memory.
[+] blueblisters|6 years ago|reply
I wonder if these results hold when the CNN-generated images are converted to an analog medium and back to digital (say scanning a printout or taking a screencap).

If not, this might indicate that the fingerprints or artifacts left by the generators are not of the "perceptible" variety.

Also a discriminator trained from this experiment might be useful to train a more powerful generator.

[+] manthideaal|6 years ago|reply
From the paper:

(1) We show that when the correct steps are taken, classifiers are indeed robust to common operations such as JPEG compression, blurring, and resizing.

(2) When using Photoshop like methods the detector performs at chance (is useless).

[+] p4bl0|6 years ago|reply
Exactly what I wondered reading the abstract.

Also, if the images are not recognizable as fakes by humans, then it's good enough. What would be interesting in going further than that? I actually see it as a feature if at the same time it's possible to prove when images are fake.

[+] leod|6 years ago|reply
Interesting. They train an image classifier to detect images that were generated by a GAN-trained CNN. I wonder if it could be possible to include this classifier in the training loss, such that the generated images fly under its radar as much as possible. If this makes sense, then I guess the cat-and-mouse game just gained another level. On the other hand, what the classifier is detecting could be a fingerprint of the CNN architecture itself.

(Full disclosure: I have only read the abstract so far.)

[+] NoodleIncident|6 years ago|reply
> Due to the difficulties in achieving Nash equilibria, none of the current GAN-based architectures are optimized to convergence, i.e. the generator never wins against the discriminator.

If I understand the terms used, it sounds like you're suggesting adding this classifier to the discriminator, to avoid detection. Since they are already failing to pass their existing discriminators, it seems like they could try to not be detected, but they wouldn't actually succeed.

[+] slipheen|6 years ago|reply
I'm not particularly familiar with neural nets, so forgive a rather ignorant question.

Could the classifier that they're using here be used as a discriminator in a GAN, to help train it to avoid this detection method?

[+] skinner_|6 years ago|reply
Absolutely possible, might even be a good idea, but my expectation is that the results won't be robust: the fakes will be uncovered by a slightly differently trained classifier. Maybe even the same classifier with a different random initialization.
[+] manmal|6 years ago|reply
Maybe avoiding detection would make the generated images look more realistic as a side effect.
[+] sdan|6 years ago|reply
Sounds possible
[+] manthideaal|6 years ago|reply
I have read the paper and there are plenty of useful references and points: related work, the 11 CNN based image generators models, and the discussion part.

But sadly I could not obtain a clear picture of what is the difference between their detector and a baseline one. There are some minor points and references about upsampling, downsampling, resizing, cropping and fourier spectra comparison across generators, but those seems to be just comments and comparison and not crucial points in the construction of the detector. Furthermore data augmentation doesn't play a big role, they say that it usually improves (a little) the detector.

As a math person I like to get some more meat from papers, but here it seems that little tricks allow then to win the game. Perhaps that is the way (little or no math involved) to make advances. Well, at least they say that shallow methods modify the fingerprint of the fourier spectra so that now you can't detect which is the generator of the image.

Perhaps the "universal word" was what captured my attention.

[+] pgodzin|6 years ago|reply
If this "universal detector" is now used as a discriminator and the original models are fine-tuned/re-trained then it will stop being a universal detector no?
[+] kfuwbi2640|6 years ago|reply
I’ve seen a number of attempts to identify deepfakes and other forms of manipulated images using AI. This seems like a fool’s errand since it becomes a never ending adversarial AI arms race.

Instead, I haven’t seen a proposal for a system I think could work well. Camera and phone manufacturers could have their devices cryptographically sign each photo or video taken. And that’s it. From that starting place, you can build a system on top of it to verify that the image on the site you’re reading is authentic. What am I missing that makes this an invalid approach?

I do understand that this would require manufacturers to implement, but it seems achievable to get them onboard. I even think you get one company like Apple to do this and it’s enough traction for the rest of the industry to have to follow suit.

[+] Shivetya|6 years ago|reply
Does it matter that they are easy to spot when the damage they can do would be well underway before a trusted service invalidates the image?

I am coming at this from the angle of, who would use this type of service other than the courts? Certainly major news organizations could benefit but we have numerous recent examples where they have either run with CNN imagery but they have also purposefully run video and use images of similar events to portray the view they wanted for a current event.

Of course in the end, if the end game is to have news, image, and video, validation there will need to be more than one and in separate enough areas of the world to have some chance all would not be intimidated / infiltrated to the point they are not trust worthy

[+] ThePowerOfFuet|6 years ago|reply
Spoiler alert: this has nothing to do with the Cable News Network.
[+] ash|6 years ago|reply
What does CNN stand for in this case?
[+] techntoke|6 years ago|reply
Well they've both been known to generate altered images that people try pass off as originals, so they relate in some ways.
[+] andy_ppp|6 years ago|reply
Surely if you want you can train the network to produce images that are not easily detectable?

So:

1) train a network that can detect CNN generated images

2) train the CNN network to generate whatever you want, politicians in compromising positions, etc. but also add in weights against the the other network

3) Images won't be easy to spot...

People will obviously start writing CNNs that detect images that are generated obfuscated this way with CNNs, but still, it's all possible.

[+] guidopallemans|6 years ago|reply
What you describe is exactly the way these models work!

Typically; a GAN (Generative Adversarial Network) consists of (1) the generator; a model generating images and (2) the discriminator; a model that learns whether images it is fed come from the generator or from the image dataset. The (gradient) information of how the discriminator made its decision is fed back into the generator, in order to help it learn how to generate more _real_ images.

The discriminator is what you describe in step 1, and the generator is your step 2.

[+] stared|6 years ago|reply
It seems that you just discovered GANs. :)
[+] DannyB2|6 years ago|reply
Arms race. First an AI can generate synthesized images.

Next, it is possible to have a test which detects those. And that test can be improved by better training.

Then, another AI learns how to synthesize images which the fake image detector AI can spot, until it learns how to fool the fake image detector.

Then the fake image detector is improved by training it against the improved fake image synthesizer.

Repeat.

[+] darawk|6 years ago|reply
This is the premise of GANs. It seems to me that the tech already exists to make extremely hard to spot fakes using GANs, it's just that nobody has bothered to write the code to do it.
[+] manthideaal|6 years ago|reply
My understanding from the discussion part, section 5 of the linked paper, is that the GAN could be modified so that the relative power of the discriminator and generator is fine tunned in order to generate hard to detect images, by giving more power to the discriminator in the final steps.
[+] adversary10450|6 years ago|reply
How is this different from an adversarial network?
[+] pmelendez|6 years ago|reply
To my knowledge, adversarial networks are actually two different networks, one “correcting” the output of the other one. On the other hand, CNN consist of just one architectural model that internally uses convolution.
[+] a3n|6 years ago|reply
Article on CNN-generated images surprisingly easy to spot is surprising difficult to read on mobile ... for now.
[+] villgax|6 years ago|reply
Uber had this coord2conv to create better CNN generated images, maybe that could fool this detector?
[+] nl|6 years ago|reply
No, that's 2 years old at this point. The tech has moved on a lot since then.
[+] Chinjut|6 years ago|reply
Yeah, from their logo in the corner!!
[+] thinkloop|6 years ago|reply
I might argue that the only reason this is on the front page is because of the confusion surrounding "CNN"