top | item 41009908

(no title)

stevenwalton | 1 year ago

> But the constraint that both eyes should have consistent reflection patterns is just another statistical regularity that appears in real photographs

Hi, author here of a model that does really good on this[0]. My model is SOTA and has undergone a third party user study that shows it generates convincing images of faces[1]. AND my undergrad is in physics. I'm not saying this to brag, I'm giving my credentials. That I have deep knowledge in both generating realistic human faces and in physics. I've seen hundreds of thousands of generated faces from many different models and architectures.

I can assure you, these models don't know physics. What you're seeing is the result of attention. Go ahead and skip the front matter in my paper and go look at the appendix where I show attention maps and go through artifacts.

Yes, the work is GANs, but the same principles apply to diffusion models. Just diffusion models are typically MUCH bigger and have way more training data (sure, I had access to an A100 node at the time, but even one node makes you GPU poor these days. So best to explore on GANs ):

I'll point out flaws in images in my paper, but remember that these fool people and you're now primed to see errors, and if you continue reading you'll be even further informed. In Figures 8-10 you can see the "stars" that the article talks about. You'll see mine does a lot better. But the artifact exists in all images. You can also see these errors in all of the images in the header, but they are much harder to see. But I did embed the images as large as I could into the paper, so you can zoom in quite a bit.

Now there are ways to detect deep fakes pretty readily, but it does take an expert eye. These aren't the days of StyleGAN-2 where monsters are common (well... at least on GANs and diffusion is getting there). Each model and architecture has a different unique signature but there are key things that you can look for if you want to get better at this. Here's things that I look for, and I've used these to identify real world fake profiles and you will see them across Twitter and elsewhere:

- Eyes: Eyes are complex in humans with lots of texture. Look for "stars" (inconsistent lighting), pupil dilation, pupil shape, heterochromia (can be subtle see Figure 2, last row, column 2 for example), and the texture of the iris. And also make sure to look at the edge of eyes (Figs 8-10) and

- Glasses: look for aberrations, inconsistent lighting/reflections, and pay very close attention to the edges where new textures can be created

- Necks: These are just never right. The skin wrinkles, shape, angles, etc

- Ears: These always lose detail (as seen in TFA and my paper), lose symmetry in shape, are often not lit correctly, if there are earrings then watch for the same things too (see TFA).

- Hair: Dear fucking god, it is always the hair. But I think most people might not notice this at first. If you're having trouble, start by looking at the strands. Start with Figure 8. Patches are weird, color changes, texture, direction, and more. Then try Fig 9 and TFA.

- Backgrounds: I make a joke that the best indicator to determine if you have a good quality image is how much it looks like a LinkedIn headshot. I have yet to see a generated photo that has things happening in the background that do not have errors. Both long-range and local. Look at my header image with care and look at the bottom image in row 2 (which is pretty good but has errors), row 2 column 4, and even row 1 in column 4's shadow doesn't make sense.

- Phase Artifacts: This one is discussed back in StyleGAN2 paper (Fig 6). These are still common today.

- Skin texture: Without fail, unrealistic textures are created on faces. These are hard to use in the wild though because you're typically seeing a compressed image and that creates artifacts too and you frequently need to zoom to see. They can be more apparent with post processing though.

There's more, but all of these are a result of models not knowing physics. If you are just scrolling through Twitter you won't notice many of these issues. But if you slow down and study an image, they become apparent. If you practice looking, you'll quickly learn to find the errors with little effort. I can be more specific about model differences but this comment is already too long. I can also go into detail about how we can't determine these errors from our metrics, but that's a whole other lengthy comment.

[0] https://arxiv.org/abs/2211.05770

[1] https://arxiv.org/abs/2306.04675

discuss

order

No comments yet.