top | item 10448099

What a Deep Neural Network thinks about selfies

262 points| vkhuc | 10 years ago |karpathy.github.io | reply

50 comments

order
[+] lqdc13|10 years ago|reply
A guide on how to take a good selfie that others will like:

  be female
  be blonde
  be attractive
Incidentally, Christian Rudder did a really good "study" on the dating site pictures a few years ago:

http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/

[+] steve_taylor|10 years ago|reply
A better guide on how to take a selfie:

    Don't take a selfie.
[+] tdaltonc|10 years ago|reply
And, if you are female, chop off you forehead.
[+] gus_massa|10 years ago|reply
Also, long hair in front of your shoulders (no ponytail).
[+] bakhy|10 years ago|reply
apparently, she should be white too.
[+] danblick|10 years ago|reply
This is neat. I bet Facebook or OkCupid are sitting on all sorts of click data that could be used to develop tools for helping people make their photos look better. (Even if, personally, I can't wait for a cultural backlash against internet narcissism...)

[Edit: Even better, he didn't use click data to train the model, just public likes.]

[+] visarga|10 years ago|reply
The idea to use a convnet to reframe the selfie is neat. Makes it 5% better. Also, if it can be run on the phone, it could possible warn people they are about to post a shitty selfie before they do.
[+] anunderachiever|10 years ago|reply
I would like to see a deep dream selfie ...

Feed it an initial picture (noise, clouds, a selfie) and then backwards manipulate the input to maximize the assessed quality of the "selfie".

I guess that would look pretty funny.

[+] Tyr42|10 years ago|reply
He did run something like that for cropping. He showed his favourite two "rude" ones at the bottom, where the 'Net cropped out the face of the person taking the selfie.
[+] misiti3780|10 years ago|reply
One thing I always found interesting is Lecun is credited with developing covnets, but Hinton is apparently credited with scaling them and showing the world how great they are in the paper from 2012 - why was Hinton's group (Toronto) able to publish these ground breaking results before Lecun's group (NYU)
[+] pramodliv1|10 years ago|reply
Geoff Hinton answers this question in episode 6 of the Talking Machines podcast. http://www.thetalkingmachines.com/blog/2015/3/13/how-machine...

Geoff Hinton had grad students who wanted to work on the problem, but Yann LeCun didn't.

"In about 2012, it should have been Yann's group, but Yann was unlucky, he didn't have a student who really wanted to do it. But we had a couple of students who wanted to do it and we took all of Yann's techniques and added some of our own."

[+] Houshalter|10 years ago|reply
IIRC the deep learning revolution started with pretraining and RBMs, which I believe Hinton invented.
[+] nightpool|10 years ago|reply
>Be female. Women are consistently ranked higher than men. In particular, notice that there is not a single guy in the top 100.

This sounds true, but it can't be the real reason—selfies are ranked relative to the other images by the same user. So unless users are taking a lot of #selfies of people of different genders, we can assume the dataset is already controlled for the gender of the person in the image, no? Unless there's some confounding factor at play, such as some demographic segment being more likely to optimize for good selfies occasionally but have boring feeds the rest of the time.

would be super interesting, if the data is available, to normalize this by exposure. Of the people that saw an image, how many clicked "like"?

[+] the8472|10 years ago|reply
Well, one of the other factors is long hair and the tendency to oversaturate the face. Those factors don't seem independent to me, men are less likely to sport long hair and they're also less likely to oversature the face to measure up to some skin perfection standards (think of it as the photographic equivalent of makeup).

> but it can't be the real reason

Can't? Ontop of the above-listed aspects it is entirely possible that there is a bias that both sexes find female appearance somewhat more aesthetically pleasing.

Similar to how focus group testing for computer voices tends to result in female voices being chosen (at least that's what I often hear, couldn't find a solid source).

Even if the bias is small the correlated factors would amplify it when you're optimizing for a maximum, i.e. for the top selection.

[+] lqdc13|10 years ago|reply
Yeah, female users probably post more pictures and also probably have more friends.
[+] JabavuAdams|10 years ago|reply
How to take a good selfie: don't be black or dark-skinned, unless you're a celebrity.

How do we prevent our AIs from learning racism?

EDIT> Informative article, BTW. A good read.

[+] Lawtonfogle|10 years ago|reply
If a given question has an answer that is due to racism, the answer is still the answer. For example, if society has some underlying racism that factors into what it considers attractive, that doesn't change what it considers attractive.

I don't think these algorithms are learning racism. They are only being blunt in revealing what already exists.

[+] apu|10 years ago|reply
This is an important point. People are thinking about it, and a lot of it will have to do with how the input data is gathered and curated.
[+] vonnik|10 years ago|reply
I think it's less about the head getting chopped than about having "the head take up about 1/3 of the image," as Karpathy says. So what the net is learning is composition, or balance in an image, which is really cool. The rule of thirds is actually pretty well know to people in photography:

https://en.wikipedia.org/wiki/Rule_of_thirds

(Our deep-learning framework http://deeplearning4j.org missed his list, but it's got working convnets, too.)

[+] Jack000|10 years ago|reply
possibly, but none of the cropped examples have cropped chins. It's also well known in photography that you can cut off someone's forehead, but never their chin.
[+] netheril96|10 years ago|reply
One caveat with these machine inspired knowledge: they are prone to error, probably more than humans, at least for now.

For example, if you train a CNN directly with human faces, its recognition rate comes way below what a human is capable of. Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.

[+] nl|10 years ago|reply
This is more wrong than right.

The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection.

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.

This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN as described using Caffe/whatever, train as described in the paper and you'll end up with something that is pretty much on par with human performance for categorizing ImageNet images.

Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes.

This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job).

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability.

For CNNs, this is pretty much entirely false.

[+] eivarv|10 years ago|reply
What type of handcrafted optimizations are you talking about here?

The state of the art I've read about* (deep CNNs) in later years rely more on generalized tricks like augmenting the training data (artificially inflating the data set), pre-training and fine-tuning, ReLU, regularization methods like dropout, etc.

For anyone interested, here [1] are some benchmarks.

* Late night here, but often in the vein of this [0] work.

[0]: https://www.cs.toronto.edu/~ranzato/publications/taigman_cvp...

[1]: http://vis-www.cs.umass.edu/lfw/results.html

[+] RealityVoid|10 years ago|reply
It seems this neural network has a sense of humor if you look at the "Finding the Optimal Crop for a selfie" area.

You can see it optimized the last selfie by cropping the face fully out of the picture.. :))

[+] spikels|10 years ago|reply
DNN is a key technology of the future. I highly recommend the education program Professor Karpathy mentions at the end of this post. All are excellent and free.
[+] JoachimS|10 years ago|reply
A really good read. Good intro to ConvNets, a well designed and implemented test. Ad funny.
[+] trhway|10 years ago|reply
looking at the top 100 one can only wonder how Hollywood has figured it out well before mighty power of computer :)
[+] goodJobWalrus|10 years ago|reply
For me, this thing about having the top of your head cut from the picture is new. Who would have thought..
[+] thewhitetulip|10 years ago|reply
Well, you don't need to ask a deep neural network to say that selfies are getting stupid daily with teens sticking their tongues out
[+] visarga|10 years ago|reply
BEEP BEEP. Bad selfie detected. You run the risk of making a fool of yourself! BEEP BEEP