This is neat. I bet Facebook or OkCupid are sitting on all sorts of click data that could be used to develop tools for helping people make their photos look better. (Even if, personally, I can't wait for a cultural backlash against internet narcissism...)
[Edit: Even better, he didn't use click data to train the model, just public likes.]
The idea to use a convnet to reframe the selfie is neat. Makes it 5% better. Also, if it can be run on the phone, it could possible warn people they are about to post a shitty selfie before they do.
He did run something like that for cropping. He showed his favourite two "rude" ones at the bottom, where the 'Net cropped out the face of the person taking the selfie.
One thing I always found interesting is Lecun is credited with developing covnets, but Hinton is apparently credited with scaling them and showing the world how great they are in the paper from 2012 - why was Hinton's group (Toronto) able to publish these ground breaking results before Lecun's group (NYU)
Geoff Hinton had grad students who wanted to work on the problem, but Yann LeCun didn't.
"In about 2012, it should have been Yann's group, but Yann was unlucky, he didn't have a student who really wanted to do it. But we had a couple of students who wanted to do it and we took all of Yann's techniques and added some of our own."
>Be female. Women are consistently ranked higher than men. In particular, notice that there is not a single guy in the top 100.
This sounds true, but it can't be the real reason—selfies are ranked relative to the other images by the same user. So unless users are taking a lot of #selfies of people of different genders, we can assume the dataset is already controlled for the gender of the person in the image, no? Unless there's some confounding factor at play, such as some demographic segment being more likely to optimize for good selfies occasionally but have boring feeds the rest of the time.
would be super interesting, if the data is available, to normalize this by exposure. Of the people that saw an image, how many clicked "like"?
Well, one of the other factors is long hair and the tendency to oversaturate the face. Those factors don't seem independent to me, men are less likely to sport long hair and they're also less likely to oversature the face to measure up to some skin perfection standards (think of it as the photographic equivalent of makeup).
> but it can't be the real reason
Can't? Ontop of the above-listed aspects it is entirely possible that there is a bias that both sexes find female appearance somewhat more aesthetically pleasing.
Similar to how focus group testing for computer voices tends to result in female voices being chosen (at least that's what I often hear, couldn't find a solid source).
Even if the bias is small the correlated factors would amplify it when you're optimizing for a maximum, i.e. for the top selection.
If a given question has an answer that is due to racism, the answer is still the answer. For example, if society has some underlying racism that factors into what it considers attractive, that doesn't change what it considers attractive.
I don't think these algorithms are learning racism. They are only being blunt in revealing what already exists.
I think it's less about the head getting chopped than about having "the head take up about 1/3 of the image," as Karpathy says. So what the net is learning is composition, or balance in an image, which is really cool. The rule of thirds is actually pretty well know to people in photography:
possibly, but none of the cropped examples have cropped chins. It's also well known in photography that you can cut off someone's forehead, but never their chin.
One caveat with these machine inspired knowledge: they are prone to error, probably more than humans, at least for now.
For example, if you train a CNN directly with human faces, its recognition rate comes way below what a human is capable of. Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.
The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection.
Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.
This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN as described using Caffe/whatever, train as described in the paper and you'll end up with something that is pretty much on par with human performance for categorizing ImageNet images.
Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes.
This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job).
Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability.
What type of handcrafted optimizations are you talking about here?
The state of the art I've read about* (deep CNNs) in later years rely more on generalized tricks like augmenting the training data (artificially inflating the data set), pre-training and fine-tuning, ReLU, regularization methods like dropout, etc.
For anyone interested, here [1] are some benchmarks.
* Late night here, but often in the vein of this [0] work.
DNN is a key technology of the future. I highly recommend the education program Professor Karpathy mentions at the end of this post. All are excellent and free.
[+] [-] lqdc13|10 years ago|reply
http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/
[+] [-] steve_taylor|10 years ago|reply
[+] [-] tdaltonc|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] gus_massa|10 years ago|reply
[+] [-] bakhy|10 years ago|reply
[+] [-] danblick|10 years ago|reply
[Edit: Even better, he didn't use click data to train the model, just public likes.]
[+] [-] visarga|10 years ago|reply
[+] [-] anunderachiever|10 years ago|reply
Feed it an initial picture (noise, clouds, a selfie) and then backwards manipulate the input to maximize the assessed quality of the "selfie".
I guess that would look pretty funny.
[+] [-] Tyr42|10 years ago|reply
[+] [-] misiti3780|10 years ago|reply
[+] [-] pramodliv1|10 years ago|reply
Geoff Hinton had grad students who wanted to work on the problem, but Yann LeCun didn't.
"In about 2012, it should have been Yann's group, but Yann was unlucky, he didn't have a student who really wanted to do it. But we had a couple of students who wanted to do it and we took all of Yann's techniques and added some of our own."
[+] [-] Houshalter|10 years ago|reply
[+] [-] nightpool|10 years ago|reply
This sounds true, but it can't be the real reason—selfies are ranked relative to the other images by the same user. So unless users are taking a lot of #selfies of people of different genders, we can assume the dataset is already controlled for the gender of the person in the image, no? Unless there's some confounding factor at play, such as some demographic segment being more likely to optimize for good selfies occasionally but have boring feeds the rest of the time.
would be super interesting, if the data is available, to normalize this by exposure. Of the people that saw an image, how many clicked "like"?
[+] [-] the8472|10 years ago|reply
> but it can't be the real reason
Can't? Ontop of the above-listed aspects it is entirely possible that there is a bias that both sexes find female appearance somewhat more aesthetically pleasing.
Similar to how focus group testing for computer voices tends to result in female voices being chosen (at least that's what I often hear, couldn't find a solid source).
Even if the bias is small the correlated factors would amplify it when you're optimizing for a maximum, i.e. for the top selection.
[+] [-] lqdc13|10 years ago|reply
[+] [-] JabavuAdams|10 years ago|reply
How do we prevent our AIs from learning racism?
EDIT> Informative article, BTW. A good read.
[+] [-] Lawtonfogle|10 years ago|reply
I don't think these algorithms are learning racism. They are only being blunt in revealing what already exists.
[+] [-] apu|10 years ago|reply
[+] [-] vonnik|10 years ago|reply
https://en.wikipedia.org/wiki/Rule_of_thirds
(Our deep-learning framework http://deeplearning4j.org missed his list, but it's got working convnets, too.)
[+] [-] Jack000|10 years ago|reply
[+] [-] netheril96|10 years ago|reply
For example, if you train a CNN directly with human faces, its recognition rate comes way below what a human is capable of. Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.
[+] [-] nl|10 years ago|reply
The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection.
Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.
This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN as described using Caffe/whatever, train as described in the paper and you'll end up with something that is pretty much on par with human performance for categorizing ImageNet images.
Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes.
This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job).
Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability.
For CNNs, this is pretty much entirely false.
[+] [-] eivarv|10 years ago|reply
The state of the art I've read about* (deep CNNs) in later years rely more on generalized tricks like augmenting the training data (artificially inflating the data set), pre-training and fine-tuning, ReLU, regularization methods like dropout, etc.
For anyone interested, here [1] are some benchmarks.
* Late night here, but often in the vein of this [0] work.
[0]: https://www.cs.toronto.edu/~ranzato/publications/taigman_cvp...
[1]: http://vis-www.cs.umass.edu/lfw/results.html
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] RealityVoid|10 years ago|reply
You can see it optimized the last selfie by cropping the face fully out of the picture.. :))
[+] [-] spikels|10 years ago|reply
[+] [-] amai|10 years ago|reply
[+] [-] JoachimS|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] trhway|10 years ago|reply
[+] [-] goodJobWalrus|10 years ago|reply
[+] [-] thewhitetulip|10 years ago|reply
[+] [-] visarga|10 years ago|reply