Scientists See Promise in Deep-Learning Programs

[+] bravura|13 years ago|reply

Since I see some misunderstanding about deep learning, let me explain the fundamental idea: It's about reusing intermediate work.

The intuition is let's say I told you to write a complicated computer program. Let's say I told you that you could use routines and subroutines, but you couldn't use subsubroutines, or deeper levels of abstraction. In this restricted case, you could write any computer program, but you would have to use a lot of code-copying. With arbitrary levels of abstraction, you could do code reuse much more elegantly, and your code would be more compact.

Here is a more formal description: If you have a complicated non-linear function, you can describe it similarly to a circuit. If you restrict the depth of the circuit, you can in principle represent any function, but you need a really wide (exponentially wide) circuit. This can lead to overfitting. (Occam's Razor) By comparison, with a deep circuit, you can represent arbitrary functions compactly.

Standard SVMs and random forests can be shown, mathematically, to have a limited number of layers (circuit depth).

It turns out that expressing deep models using neural networks is quite convenient.

I gave an introduction to deep learning in 2009 that describes these intuitions: http://vimeo.com/7977427

[+] tgflynn|13 years ago|reply

If you restrict the depth of the circuit, you can in principle represent any function, but you need a really wide (exponentially wide) circuit.

Are you sure it's exponential ?

If you look at binary functions (ie. boolean circuits) any such function can be represented by a single layer function whose size is linear in the number of gates of the original function (I think it's 3 or 4 variables per gate) by converting to conjunctive normal form.

Of course it's not obvious that a similar scaling exists for non-binary functions but I'd be a bit surprised if increasing depth led to an exponential gain in representational efficiency.

[+] dchichkov|13 years ago|reply

Reusing intermediate work? I don't think this is a good intuition. Using several levels of abstraction is more like it.

For example, in face recognition, first level - could be pixels. Second level - edges and corners: http://www.cs.nyu.edu/~yann/research/deep/images/ff1.gif Third - parts of the face: http://people.cs.umass.edu/~elm/images/face_feature.jpg

[+] claudiusd|13 years ago|reply

I'm not an expert on this, but I think this article overstates the relationship between "deep learning methods" and "neural networks". Neural nets have been around forever and, in the feed-forward case, are actually fairly basic statistical classifiers.

Deep learning, on the other hand, is about using layers of classifiers to progressively recognize higher-order concepts. In computer vision, for example, the first layer of classifiers may be recognizing things like edges, blocks of color, and other simple concepts, while progressive things may be recognizing things like "arm", "desk", or "cat" from the lower-order concepts.

There's a book I read a while ago that was super-interesting and digs in to how one researcher leveraged knowledge about how the human brain works to develop one of these deep learning methods: "On Intelligence" by Jeff Hawkins (http://www.amazon.com/On-Intelligence-Jeff-Hawkins/dp/B000GQ...)

[+] wookietrader|13 years ago|reply

No.

All currently used deep learning algorithms are special cases of neural networks. The reason why this is called "deep" learning is that before 2006, no one knew how to efficiently train neural nets with more than 1 or 2 hidden layers. (Or could, because of computing power.) Thanks to a breakthrough by Dr Hinton, this is now the case.

But all models used are neural nets. It's just that a vast amount new algorithms for training them have been developed in the last years and people came up with new ideas on how to use them.

But it is all neural nets. And that's the whole beauty of it.

[+] unknown|13 years ago|reply

[deleted]

[+] slacka|13 years ago|reply

Jeff has some fascinating theories on AI that I think have a real chance of taking us out of this rut that AI has been stuck in for the past 60 years. If you want a good overview of what's in his book, "on Intelligence", check out his TED talk. http://www.ted.com/talks/jeff_hawkins_on_how_brain_science_w...

[+] theschwa|13 years ago|reply

Geoffrey Hinton, mentioned in the article, has his class on neural networks available on Coursera https://www.coursera.org/course/neuralnets

[+] djacobs|13 years ago|reply

Hinton was one of the people who invented backpropagation, which has let neural nets be as powerful as they are today. Somehow, despite his brilliance and intimate familiarity with backpropagation, his explanation of it is stunningly clear and simple. I'm thoroughly enjoying this course and recommend it to anyone who wants to build their own neural networks.

[+] ozgung|13 years ago|reply

I was watching his lectures and I saw this post when I take a break. He was talking about the ups and downs in the history of Neural Nets. As far as I understand from all these lectures we're on the verge of a new up phase. Neural Nets are meaningful when they are large and deep and training such nets becomes feasible, although not immediately.

[+] riffraff|13 years ago|reply

I started taking the class but had to take a break due to other real life occurrences. It was very enjoyable both content wise and style wise, for what my opinion count, I recommend it.

[+] dave_sullivan|13 years ago|reply

For those looking to learn about these techniques, I'd highly recommend the deep learning theano tutorials.

Hinton has a class on Coursera--I think it would be very confusing for beginners, but it has really great material.

Also, I run the "SF Neural Network Aficionados" meetup in san francisco and will be giving a workshop in January about building your own DBN in python, so feel free to check that out if you're in SF (although space was an issue last time).

[+] plg|13 years ago|reply

Pls put notes & code online

[+] stewie2|13 years ago|reply

How is "deep learning" different from "neural network"?

[+] wwwtyro|13 years ago|reply

DBN?

[+] gdahl|13 years ago|reply

I was involved in the speech recognition work mentioned in the article and I led the team that won the Merck contest if anyone has any questions about those things. I also spend some time answering any machine learning question I feel qualified to answer at metaoptimize.com/qa

[+] lbenes|13 years ago|reply

Congratulations on winning the Merck contest! That was an impressive demonstration.

About 12 years ago, I switched from a Bio major to CS. I hoped to major in AI, but after taking 2 upper level classes, one focusing on symbolic AI and the other focusing on Bayesian networks, I was completely turned off.

Our brains are massively parallel redundant systems that share practically nothing in common with modern Von Neumann CPUs. It seemed the only logical approach to AI was to study neurons. Then try to discover the basic functional units that they form in simple biological life forms like insects or worms. Keep reverse engineer brains of higher and higher life forms until we reach human level AI.

Whenever I tried to relate my course material in AI to what was actually going on in a brain, my profs met my questions with disdain and disinterest. I learned more about neurons in my high school AP Bio class than either of my AI classes. In their defense, we've come a long ways, with new tools like MRIs and neural probes.

The answers are all locked up in our heads. It took nature millions of years of natural selection to engineer our brains. If we want to crack this puzzle in our lifetimes, we to copy nature, not reinvent it from scratch. Purely mathematical theories like Bayesian statistics that have no basis in Biological systems might work in specific cases, but are not going to give us strong AI.

Are these new deep learning algorithms for neural networks rooted in biological research? Do we have to necessary tools yet to start reversing engineering the basic functional units of the brain?

[+] Aron|13 years ago|reply

I worked on the Netflix prize and haven't learned anything since then. There the RBM (or modified version per ruslan's paper) performed very well but not substantially better than the linear models (in apples to apples comparison.. ignore the time-dimension and peeking at the contents of the quiz\test set). And as I recall no one really made any progress with deeper networks on that problem. Has anything been learned since then that would suggest progress there?

I also don't recall anyone successfully incorporating the date of the rating into the RBM. Mostly this was useful in other models because on any particular day people would just bias their ratings up or down a bit. But also, as one can imagine, over the course of a year or two their tastes would change. Is it straightforward to include that time dimension into RBMs, and if so, is that a recently discovered technique?

[+] taliesinb|13 years ago|reply

I played around for a while with writing an RBM learner in Go (RBMs are a particular instance of deep learning which Hinton specializes in).

More an experiment than anything else, but for anyone who is interested: https://github.com/taliesinb/gorbm. I don't claim there aren't bugs, and there is no documentation.

The consensus I've picked up from AI-specializing friends is that there are a lot of subtle gotchas and tricks (which Hinton and friends know about but don't necessarily advertise) without which RBMs are a non-starter for many problems. Which I suppose is pretty much standard for esoteric machine learning.

[+] jhartmann|13 years ago|reply

Deep belief networks are extremely powerful, we are finally getting to the point where we don't need to do tons of feature engineering to make useful complex classifiers. Used to be you would have to spend a ton of time doing data analysis and feature extraction to get useful and robust classifiers. Of course the usefulness of those sorts of networks were limited by how well you did the feature extraction. Now you train networks with much more minimally processed data, and get great results out of them.

[+] mbq|13 years ago|reply

Since the fall of AI, there are two groups of people in this topic -- one trying to make some reproducible, robust results with well defined algorithms and second importing random ideas from the first group onto some questionably defined ANN model and getting all the hype because of the "neural" buzzword. "Deep learning" is actually called boosting and has been around for years.

[+] robrenaud|13 years ago|reply

Unsupervised pre-training is fundamentally different than boosting.

Boosting is a clever way of modelling a conditional distribution. The insight behind the success of pre-training is that, for many perceptual tasks, having a good model of the input (rather than the input->output mapping) is key.

I have no delusion that the algorithms that work for training deep networks are anything like what the brain actually does, but I don't care. There are many tasks where deep neural nets are state of the art.

[+] iskander|13 years ago|reply

Boosting selects successive weak learners for the same classification problem, but under a changing distribution/weighting of the input space. Deep learning stacks complex models to create increasingly abstract representations. All I can really imagine them having in common is (1) they're both families of machine learning techniques and (2) they both (roughly) involve a collection of models, albeit in very different ways.

[+] gdahl|13 years ago|reply

Deep learning is not boosting at all. Deep learning is about composing trainable modules. Adding a layer f(x) to a layer g(x) to get h(x) = f(g(x)). Boosting creates a final classifier that is a weighted sum of the base classifiers, or something like h(x) = a * f(x) + b * g(x). Composition is what Professor Hinton means when he says "re-represent the input" and other similar phrases.

[+] mturmon|13 years ago|reply

Pegged to the NIPS conference next week: http://nips.cc/Conferences/2012/

[+] Create|13 years ago|reply

The students were also working with a relatively small set of data;

ANN-s are overfitted more often than not.

[+] radarsat1|13 years ago|reply

Are there any good C++ or Python SciPy libraries for building and training deep learning networks?

[+] jhartmann|13 years ago|reply

There is a C++/Cuda library with a python frontend that I am starting to play with that is from one of the guys who works with Hinton. It is written by Alex Krizhevsky and has lots of tools for training feed forward networks with lots of different connection topologies and neuron types. If I am not mistaken this was the library that was used in the recent Kaggle drug competition that is referenced in the article. There is some good starting point documentation here as well to look into, as long as you know enough about the mechanics of Artificial Neural Networks it has some really interesting stuff in there.

Here is the link: http://code.google.com/p/cuda-convnet/

[+] osdf|13 years ago|reply

Mentioned somewhere else: http://deeplearning.net/software/theano/ with a tutorial http://deeplearning.net/tutorial

Not C++ or Python, but lua with lots of stuff: torch 7 [[http://www.torch.ch/]]

[+] teeja|13 years ago|reply

Is there a good place to plug-in to get an overview of what (has been and is) going on in this area, without having to dive in all the way? An overview of the concepts, not the nuts and bolts, not the heavy-lifting.

[+] pilooch|13 years ago|reply

the one overview I've found the most useful is http://www.youtube.com/watch?v=ZmNOAtZIgIk (Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning by Andrew Ng in April 2011).

[+] mikecane|13 years ago|reply

Can someone contrast what's in that article with what Jeff Hawkins' Numenta is attempting?

[+] pootch|13 years ago|reply

[deleted]

68 comments