Can we perhaps edit "singularity is near" out of the title? This sounds impressive, but having a bunch of racks able to classify the outline of a face is vastly disconnected from machine and humanity merging.
I was going to make the same request.
The singularity should be discussed where relevant, not added to everything.
This paper is producing high level features from noisy data in an unsupervised fashion -- a human still needs to indicate the task it should be targeted for and a human still needs to provide labelled training data for these high level features to be of use.
This work is interesting enough to warrant detailed discussion on the topic at hand, large scale machine learning, rather than just rehashing discussions of the singularity.
Added: As I can't reply to the comment below I'll do it here =] The network provides learned representations that are discriminative.
The aim of the network is to learn high level features representative of the content.
One of the many features it produced was one which accurately indicated the presence of a face in the image.
Note that they said train a face detector and not classify.
For example, from the same network there was a feature which accurate detected cats yet they didn't explicitly train a cat detector either (see the section "Cat and human body detectors").
As the network represents the content as generic features it is clear that, if it reaches a high enough level, those features are essentially classifications themselves.
tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", others with "has cat in image", etc, but these features cannot be used without labelled training.
Normally I advocate adherence to posting the original article title on HN, but if that had been the case I doubt this article would have ever got enough attention to be upvoted. Singularity is near is over the top.
It does this for 20,000 different objects categories - this is getting close to matching human visual ability (and there are huge societal implications if computer reach that standard).
This is the most powerful AI experiment yet conducted (publicly known).
I put the singularity bit in to make it relevant for those who are non-technical. This experiment is significant because it shows that large artificial neural networks can be made to work. People have tried and failed at this for decades.
This technigue was "discovered" by geoff hinton at the university of toronto in 2005. However, nobody at tried (or maybe got enough funds) to try it this scale.
If this continues to work at larger and larger scale, this would be a machine learning technique that can work accurately on tasks that are hugely important to society
- accurate speech recognition
- human level compuer vision (make human manual labor redundant)
BUT the detector built its own categories(!). It managed to find 20,000 different categories of objects in Youtube videos, and one of these categories corresponded to human faces, and another to cats.
Once the experimenters found the "face detection neuron" and used it to test faces THAT neuron managed 81.7% detection rate(!).
Forget the singularity, and just think about how amazing that is. The system trained itself - without human labelling - to distinguish human faces correctly over 80% of the time.
You're in danger of missing the point too far in the other direction. The system just returns yes/no as to whether an image has a face in it, and if it was hard-coded to respond "no" it would score 64.8%.
Obviously this is extremely impressive work, and given that Google gives away 1e9 core hours a year, I'd like to see how much further they can push this network (which only used 16e3x3x24 ~ 1e6 hours). But this isn't like scoring 80% in a written exam.
I'm also impressed by how readable the paper was. Apart from a few paragraphs of detailed maths this should be accessible to anyone who's read the wikipedia article on neural networks.
There was an interesting discussion on Quora about this recently[0]
The most relevant quote being perhaps:
"The magic of the brain is not the number of neurons, but how the circuits are wired and how they function dynamically. If you put 1 billion transistors together, you don't get a functioning CPU. And if you put 100 billion neurons together, you don't get an intelligent brain."
That's an interesting discussion, but this experiment suggests exactly the opposite (perhaps that's why you included the discussion). Who knows, if we put 1 billion cores together, and fed it a massive amount of data (akin to what a baby receives as he/she matures), perhaps we would get a brain we would consider "intelligent". The fact that this system was able to pick out high-level features like "face" and "cat" without any prior training -- and with only 1000 cores, not 1 billion -- is quite suggestive that they're on to something.
16,000 cores sounds impressive until you realize it's just five to ten modern GPUs. For Google, it's easier to just run a 1,000 machine job than requisition some GPUs.
What a GPU calls a "core" doesn't at all correspond to what a CPU calls a "core". Going by the CPU definition (something like "something that can schedule memory accesses") a high end GPU will only have 60 or so cores. And going by the the GPU definition (An execution unit) a high end CPU will tend to have 30-something cores.
GPUs do have fundamentally more execution resources, but that comes at a price and not every algorithm will be capable of running faster on a GPU than on a CPU. If neural networks just involve multiplying lots of matrices together with little branching they might be well suited to GPUs, but most AI code isn't like that.
The singularity is a poorly constructed myth. It is built around the presumption that intelligence is a linear function of CPU power, and that surely as CPU power rises, so shall intelligence; the problem is, that prediction was made in the 1970s, since which CPU power has risen ten decimal orders of magnitude, and we still don't have much better speech recognition than we did back then, let alone anything even approaching simple reasoning.
The ability to detect faces is not a signal that general intelligence is right around the corner.
Well, maybe. There are a whole lot of very different things called "The Singularity" and some of them are much more reasonable than others.
There's the Cambpellian Singularity, which says that we won't be able to predict what will happen next. Pretty non-controversial as far as it goes.
There's the Vingean Singularity, which says that if we ever develop AIs that can think as fast and as well as humans then due to Moore's Law they'll be thinking twice as fast as humans after 2 years, so they'll start designing chips and the period of Moore's law will fall to 1 year, and so on with us reaching infinite computing power in finite time. I think this vision is flawed.
Relatedly, there's the Intelligence Explosion Singularity (associated with Yudkowsky), which says that as soon as its AIs designing AIs, smarter AIs will relativly quickly be able to make even smarter AIs and we'll get a "fwoosh" effect, though not to infinity in finite time. I find this unlikely, but can't rule it out.
There's one I don't have a handy name for, but lets call it the AI Revolution viewpoint, which is that AIs will cause civilization to switch to a faster mode of progress, just like the Agricultural Revolution and Industrial Revolution did. This one will only look like a singularity in hindsight, and might seem gradual to the people living through it. I think this one is pretty credible.
There's the Kurzweilian Singularity, where thanks to Accelerating Change we'll someday pass a point which will arbitrarily be called the Singularity. As far as I can tell this is Kurzweil appropriating the hot word of the moment for his ideas a la Javascript.
Then there's the Naive Singularity, which equates processing power with intelligence and then concludes that computers must be getting smarter. This is indeed totally naive and not something we should worry about. I guess the linked paper is evidence that you can substitute a faster computer for smarter AI researchers to some extent, but probably not a very large one.
The only originality in this work is the processing power used. The principle in it self is not original.
For a possible resemblance with the real cortical neural network working principle and face or object recognition, this is just a farce.
Regarding getting closer to the presumed singularity, this is like saying that cutting flint is close to making diamonds.
The authors didn't claim that, but the abusive use of "neural network" for such kinds of applications is just doing that. It is a dishonest abuse of people who can't make the difference.
The true problem is that significant quality work toward modeling real cortical neural network is drown in the sea of such faker crap.
I think much of this is overhyped as well, but I disagree that whether modeling human brain structure is relevant. The term "neural network" has historical baggage (responsible for some of the hype), but these days refers to a class of mathematical approaches with only historical connection to "neurons". Those can be interesting on their own for AI purposes, and imo accurate modeling of the human brian, while interesting for neuroscience research, is not necessarily the way forward for AI research.
You'd be surprised at how little precision this much computing power gets you, even on very basic classifiers. If anything, working on this kind of stuff has given me an appreciation for just how far away the singularity may be.
Maybe I'm missing something here, but how exactly is it "unlabeled" data if they're specifically feeding it millions of pictures of "faces"? I mean, if you make a specific selection of the type of images you train the network on, isn't that basically equivalent to labeling them?
The aim of the paper was to produce an unsupervised system that would generate high level features from noisy data.
These high level features could then be used in supervised systems where labelled data is added.
Thus, the paper is about using an unsupervised system to help a later supervised system. An advantage of this is that, as the unsupervised system isn't trained to recognise object X, it instead learns features that are discriminative. This same network could be used to recognise arbitrary objects (which is what they do later on in the paper with ImageNet).
Nope. Not all the images contained faces (cats, bodies, etc.). There was no specific face-detection code. The system just learned the concepts from the data. http://en.wikipedia.org/wiki/Unsupervised_learning
You're correct: it isn't an unlabelled system, and the article author is deeply confused about basic topics in artificial intelligence.
What he's trying to talk about is "this is an unsupervised feature detector in a large dataset which is only categorized, and where no human has provided correct answers up front to verify progress."
The reason this matters (and it doesn't matter very much) is that that means that in cases where it's prohibitive to provide training sets, such as where you don't know the good answer yourself, or where giving a decent range of good answers would be difficult, this sort of approach can still be used.
"isn't that basically equivalent to labeling them?"
Yes. It is. The original poster is confused.
What he meant to say was "there is no training set."
before, there was room to double accuracy 3 times. Now, there's not. If i understood correctly their approach can take advantage of parallelism. I'm not saying they can just throw 128k cores at the problem and be done, adding 2^n resources will likely have a nice boost to results.
They just have to scale it up - more computers, more days, and the accuracy level should increase accordingly (that is my intuition and hope on this, though i could be wrong).
This isn't singularity material. While this may not be a bog standard neural network, it has no feedback. It cannot think, because thinking requires reflection. It is trained by adjusting the weights of the connections after the fact using an equation.
Is it cool, and perhaps even useful? Yes. But don't confuse this research project for a precursor to skynet.
I put the singularity bit in to make it relevant to people who would otherwise not get the significance of this (which is that large scale neural nets can work - something people have been trying and failing at for decades).
If anyone is interested to read more on this topic, there is another recent, closely related and perhaps slightly more accessible paper ( "High-Level Invariant Features with Scalable Clustering Algorithms" http://bit.ly/KDuN04 ) from Stanford that also learns face neurons from unsupervised collection of images (disclaimer: I'm co-author). It uses a slightly different model based on layers of k-means clustering and linking, but the computation in the end is very similar.
I'm familiar with both models so I can also try to answer any questions.
I'm seriously considering quitting my job and studying ML for a few months in a desperate attempt to get work in projects like this. I feel like I'm missing out but too dumb for traditional grad school.
From the article: "It is worth noting that our network is still tiny compared to the human visual cortex, which is 1,000,000 times larger in terms of the number of neurons and synapses."
> the dataset has 10 million 200x200 pixel images downloaded from the Internet
They take the frames from YouTube. It is weird to me that YouTube, (derided as a way of sharing funny cat videos) is able to contribute something actually useful to the world.
Youtube contains a lot of educational material you would not find otherwise. If I wanted to learn sewing on a sewing machine, I would just watch some video tutorials - try that with a book on sewing. Same thing with instructions on how to play instruments. Heck you can even watch videos on how to fix problems with your car engine. Many procedural instructions can't be transported properly via papers or books. It also allows asynchronous video messaging for laymen asking experts stuff which is difficult to get via written text. I bet that youtube will contribute very much to knowledge preservation and distribution in the long term.
As far as I can tell, this is let's train a huge number of models and then cherry-pick few that works well on a test set, so an overfitted junk. What have I missed?
[+] [-] forgotusername|13 years ago|reply
[+] [-] Smerity|13 years ago|reply
This work is interesting enough to warrant detailed discussion on the topic at hand, large scale machine learning, rather than just rehashing discussions of the singularity.
Added: As I can't reply to the comment below I'll do it here =] The network provides learned representations that are discriminative. The aim of the network is to learn high level features representative of the content. One of the many features it produced was one which accurately indicated the presence of a face in the image. Note that they said train a face detector and not classify. For example, from the same network there was a feature which accurate detected cats yet they didn't explicitly train a cat detector either (see the section "Cat and human body detectors"). As the network represents the content as generic features it is clear that, if it reaches a high enough level, those features are essentially classifications themselves.
tldr; High-level features generated by this unsupervised network are so high-level that one of them aligns with "has a face in the image", others with "has cat in image", etc, but these features cannot be used without labelled training.
[+] [-] jackfoxy|13 years ago|reply
[+] [-] Havoc|13 years ago|reply
[+] [-] marshallp|13 years ago|reply
This is the most powerful AI experiment yet conducted (publicly known).
[+] [-] marshallp|13 years ago|reply
This technigue was "discovered" by geoff hinton at the university of toronto in 2005. However, nobody at tried (or maybe got enough funds) to try it this scale.
If this continues to work at larger and larger scale, this would be a machine learning technique that can work accurately on tasks that are hugely important to society
- accurate speech recognition - human level compuer vision (make human manual labor redundant)
[+] [-] nl|13 years ago|reply
Yes, 15% accuracy doesn't seem great.
BUT the detector built its own categories(!). It managed to find 20,000 different categories of objects in Youtube videos, and one of these categories corresponded to human faces, and another to cats.
Once the experimenters found the "face detection neuron" and used it to test faces THAT neuron managed 81.7% detection rate(!).
Forget the singularity, and just think about how amazing that is. The system trained itself - without human labelling - to distinguish human faces correctly over 80% of the time.
[+] [-] Jabbles|13 years ago|reply
Obviously this is extremely impressive work, and given that Google gives away 1e9 core hours a year, I'd like to see how much further they can push this network (which only used 16e3x3x24 ~ 1e6 hours). But this isn't like scoring 80% in a written exam.
I'm also impressed by how readable the paper was. Apart from a few paragraphs of detailed maths this should be accessible to anyone who's read the wikipedia article on neural networks.
http://googleblog.blogspot.com/2011/04/1-billion-computing-c...
[+] [-] agravier|13 years ago|reply
It's not revolutionary. Clustering algorithms and neural nets are plenty.
Really, what differentiates this network is its scale.
[+] [-] Homunculiheaded|13 years ago|reply
The most relevant quote being perhaps:
"The magic of the brain is not the number of neurons, but how the circuits are wired and how they function dynamically. If you put 1 billion transistors together, you don't get a functioning CPU. And if you put 100 billion neurons together, you don't get an intelligent brain."
0. http://www.quora.com/How-big-is-the-largest-feedforward-neur...
[+] [-] eric_bullington|13 years ago|reply
EDIT: Mistyped number of cores. 1000, not 100.
[+] [-] seiji|13 years ago|reply
See: http://www.nvidia.com/object/tesla-servers.html (4.5 teraflops in one card)
Reminder: GPUs will destroy the world.
[+] [-] Symmetry|13 years ago|reply
GPUs do have fundamentally more execution resources, but that comes at a price and not every algorithm will be capable of running faster on a GPU than on a CPU. If neural networks just involve multiplying lots of matrices together with little branching they might be well suited to GPUs, but most AI code isn't like that.
[+] [-] JohnHaugeland|13 years ago|reply
The ability to detect faces is not a signal that general intelligence is right around the corner.
[+] [-] Symmetry|13 years ago|reply
There's the Cambpellian Singularity, which says that we won't be able to predict what will happen next. Pretty non-controversial as far as it goes.
There's the Vingean Singularity, which says that if we ever develop AIs that can think as fast and as well as humans then due to Moore's Law they'll be thinking twice as fast as humans after 2 years, so they'll start designing chips and the period of Moore's law will fall to 1 year, and so on with us reaching infinite computing power in finite time. I think this vision is flawed.
Relatedly, there's the Intelligence Explosion Singularity (associated with Yudkowsky), which says that as soon as its AIs designing AIs, smarter AIs will relativly quickly be able to make even smarter AIs and we'll get a "fwoosh" effect, though not to infinity in finite time. I find this unlikely, but can't rule it out.
There's one I don't have a handy name for, but lets call it the AI Revolution viewpoint, which is that AIs will cause civilization to switch to a faster mode of progress, just like the Agricultural Revolution and Industrial Revolution did. This one will only look like a singularity in hindsight, and might seem gradual to the people living through it. I think this one is pretty credible.
There's the Kurzweilian Singularity, where thanks to Accelerating Change we'll someday pass a point which will arbitrarily be called the Singularity. As far as I can tell this is Kurzweil appropriating the hot word of the moment for his ideas a la Javascript.
Then there's the Naive Singularity, which equates processing power with intelligence and then concludes that computers must be getting smarter. This is indeed totally naive and not something we should worry about. I guess the linked paper is evidence that you can substitute a faster computer for smarter AI researchers to some extent, but probably not a very large one.
[+] [-] chmike|13 years ago|reply
For a possible resemblance with the real cortical neural network working principle and face or object recognition, this is just a farce.
Regarding getting closer to the presumed singularity, this is like saying that cutting flint is close to making diamonds.
The authors didn't claim that, but the abusive use of "neural network" for such kinds of applications is just doing that. It is a dishonest abuse of people who can't make the difference.
The true problem is that significant quality work toward modeling real cortical neural network is drown in the sea of such faker crap.
[+] [-] _delirium|13 years ago|reply
[+] [-] jamesaguilar|13 years ago|reply
[+] [-] ekianjo|13 years ago|reply
No more supplements-eating Kurzweil, walking Terminators and Skynet-like BS please.
[+] [-] scotty79|13 years ago|reply
[+] [-] bfrs|13 years ago|reply
[+] [-] jpeterson|13 years ago|reply
[+] [-] Smerity|13 years ago|reply
Thus, the paper is about using an unsupervised system to help a later supervised system. An advantage of this is that, as the unsupervised system isn't trained to recognise object X, it instead learns features that are discriminative. This same network could be used to recognise arbitrary objects (which is what they do later on in the paper with ImageNet).
[+] [-] endtime|13 years ago|reply
[+] [-] JohnHaugeland|13 years ago|reply
What he's trying to talk about is "this is an unsupervised feature detector in a large dataset which is only categorized, and where no human has provided correct answers up front to verify progress."
The reason this matters (and it doesn't matter very much) is that that means that in cases where it's prohibitive to provide training sets, such as where you don't know the good answer yourself, or where giving a decent range of good answers would be difficult, this sort of approach can still be used.
"isn't that basically equivalent to labeling them?"
Yes. It is. The original poster is confused.
What he meant to say was "there is no training set."
[+] [-] yaroslavvb|13 years ago|reply
[+] [-] kenrikm|13 years ago|reply
[+] [-] nl|13 years ago|reply
Also, the previous best on the same dataset was 9.3%
[+] [-] jfoutz|13 years ago|reply
OTOH, it's late and i might be way off.
[+] [-] marshallp|13 years ago|reply
[+] [-] semisight|13 years ago|reply
Is it cool, and perhaps even useful? Yes. But don't confuse this research project for a precursor to skynet.
[+] [-] marshallp|13 years ago|reply
[+] [-] cbhl|13 years ago|reply
"Our training dataset is constructed by sampling frames from 10 million YouTube videos."
[+] [-] karpathy|13 years ago|reply
I'm familiar with both models so I can also try to answer any questions.
[+] [-] bfrs|13 years ago|reply
[+] [-] sown|13 years ago|reply
[+] [-] p1esk|13 years ago|reply
[+] [-] roel_v|13 years ago|reply
Could anyone with expertise say if this would be enough to build a foundation? How much math background do you need?
[+] [-] Devilboy|13 years ago|reply
https://www.coursera.org/course/ml (From one of the authors of this paper!)
https://www.coursera.org/course/vision
https://www.coursera.org/course/computervision
Prof. Hinton's videos are very watchable:
http://www.youtube.com/watch?v=AyzOUbkUf3M
http://www.youtube.com/watch?v=VdIURAu1-aU
[+] [-] epaga|13 years ago|reply
[+] [-] pwmanagerdied|13 years ago|reply
[deleted]
[+] [-] dchichkov|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] DanBC|13 years ago|reply
They take the frames from YouTube. It is weird to me that YouTube, (derided as a way of sharing funny cat videos) is able to contribute something actually useful to the world.
[+] [-] manmal|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] mbq|13 years ago|reply
[+] [-] moultano|13 years ago|reply