top | item 13171883

Neurogenesis Deep Learning

197 points| groar | 9 years ago |arxiv.org | reply

97 comments

order
[+] SubiculumCode|9 years ago|reply
They simulate neurogenesis, I guess, but they do not incorporate the most interesting part of that neurogenesis: That is the new neurons are born into the dentate gyrus, a region thought to have a particular capacity to orthoganalize feature representations that are similar (e.g. pattern separate) allowing distinct memories to be formed for similar events. The dentate gyrus outputs to a region called Cornu ammonis 3 (CA3) which is heavily recurrent, and thought to br able to pattern complete a full representation from partial inputs. That is, CA3 can encode and retrieve the relations between 2 or more features or objects. For a mathematical model and review one might read: Rolls (2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812781/

but many others exist. I'd write more but typing in my phone is driving me to distraction.

[+] arkymark|9 years ago|reply
this is really interesting, where/how did you learn this?

I'd like to learn more about these things - brain regions, connections, functions - and what they might imply about the kinds of computations that are going on, but my background is mainly on the AI/math side of things.

[+] jostmey|9 years ago|reply
Neurogensis? How about neural death as a way to prune large neural networks into more compact ones--now that is a research idea!
[+] kajecounterhack|9 years ago|reply
https://arxiv.org/abs/1503.02531

Modern applications of small networks regularly reduce sizes from larger state-of-the-art networks using distillation. Distillation compacts neural networks while affecting accuracy minimally.

Instead of pruning directly from the large network, just learn how it generalizes. Takes fewer nodes / overall operations (Multiplications / Additions).

[+] antome|9 years ago|reply
This idea is at least partially in use with regularisation and dropout. The difference at least with dropout is that the "killed" neurons are then massaged back into the network in order become useful again.
[+] 10b5-1|9 years ago|reply
Looks like these researchers are trying to make a network more adaptive, I think that deleting nodes would only make them worse at the current task they're being trained on as well as worse on the tasks they're being adapted to.

You could train a model using neurogenesis to increase its accuracy, and then use distillation to train a smaller network to comparable accuracy.

But these are two very different, but complementary, problems.

[+] chriswarbo|9 years ago|reply
As others have mentioned, there are approaches like regularisation and dropout which try to do similar things. What I find interesting is the fact there are two reasons to do this: to generalise/avoid-overfitting and to reduce resource usage.

It seems like almost all effort is spent on the former, since everyone's aiming for higher accuracy numbers. Are there any widely-used methods to tackle the latter?

For example, I'm imagining a system which is either given measurements of its resource usage (time, memory, etc.) or uses some simple predictive model (e.g. time ~ number of layers * some constant), and works within some resource bound:

- If we're below the bound, expand the model (add neurons, etc.) to allow accuracy increases (note "allow": it's ok to ignore/regularise-to-zero the extra parameters to avoid overfitting)

- If we're above the bound, prune the model (in a way which tries to preserve accuracy)

- Allocate resources to optimise some objective, e.g. reduce variance by pruning the parameters of the best-performing class/predictor/etc. and using those resources to expand the worst performer.

The closest thing I know of are artificial economies, but they seem to be more like a selection mechanism (akin to genetic programming) than a direct optimisation procedure (like gradient descent on an ANN).

[+] hyperbovine|9 years ago|reply
So basically, give your deep networks drugs and alcohol.
[+] groar|9 years ago|reply
Basically trying to achieve a certain level of plasticity in deep neural nets by getting inspiration from https://en.wikipedia.org/wiki/Adult_neurogenesis
[+] andreyk|9 years ago|reply
To add on to this - they "specifically consider the case of adding new nodes to pre-train a stacked deep autoencoder", by basically keeping track of when certain layers cannot reproduce their input and then adding more nodes+retraining with both new (not reproduced) and old data. It is quite intuitive, basically the most naive and obvious first attempt at the problem (not meant in a condescending way, just want to point out it's not that generalizable and is pretty ad-hoc).
[+] argonaut|9 years ago|reply
Sorry if I'm being snobbish, but I do wonder why this paper is only being submitted to IJCNN, a 2nd tier machine learning conference. I know students who publish undergrad research at workshops with lower acceptance rates than IJCNN. I can't think of any important machine learning papers published in IJCNN in the recent past.
[+] habitue|9 years ago|reply
It depends on what conclusions you're trying to draw from that information. What conference a paper was accepted to is a second-order signal of the noteworthiness. It's probably easier for someone versed in the field to just read the paper to determine if it's interesting. If you're using the conference as a quick pass/fail as you skim through the abstracts of hundreds of papers, ok, but you probably wouldn't make time to comment on HN about it in that case.

This paper looks like it builds on pretty well-known techniques like stacked autoencoders, so let's see what first-order noteworthiness data we can gather from a quick skim of the paper. If I had to guess why it wasn't accepted into a better conference:

- It uses stacked autoencoders, which are pretty out of fashion

- It bothers reporting results on MNIST

- (more subjectively) It pulls an unfortunately common technique of saying "here's something the brain does" and then hand-waving that it's a deep reason why a technique they've come up with is useful, when in fact the relationship is just "inspired by the general idea of", not "performs the same function as" the biological mechanism. In this case, I think the tenuous connection of their technique to research on neurogenesis is pretty flimsy. Clearly neurogenesis is not how an adult human brain forms new memories or gains proficiency in new skills (which they acknowledge in the conclusion)

[+] eruditely|9 years ago|reply
Does it matter? I assume they just wanted to get it out and publish it.
[+] irinarish|9 years ago|reply
A good point was made that a model of neurogenesis must also incorporate neuronal death besides neuronal birth (since hippocampus and the brain as a whole have physical constraints, you can't keep growing your network infinitely :). That's why any model of neurogenesis must incorporate interplay between birth and death of new (and old) neurons; that's was the main idea of the paper I mentioned in an earlier post (this year ICLR submission https://openreview.net/forum?id=HyecJGP5ge) Note that just adding nodes to networks was proposed before, eg. the classical work on cascade correlations.
[+] irinarish|9 years ago|reply
For a model that incorporates both neuronal birth and death, see ICLR submission at OpenReview:

https://openreview.net/revisions?id=HyecJGP5ge

NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano

[+] gallerdude|9 years ago|reply
Very wishful thinking on my part, but I think we're far closer to a general intelligence than most expect.
[+] empath75|9 years ago|reply
I think what we might see is a kind of autonomous corporation that is nominally under the control of shareholders, a CEO or a board, but which makes decisions without very much or any human input, and which gains some amount of legal rights through corporate personhood.

It won't be a 'general ai', though. More like a set of loosely connected systems that operate 'in the best interests of the shareholders', however that's defined.

It's pretty much the end state of the trend of pushing decision making to algorithms to remove moral and legal culpability from individuals.

[+] amelius|9 years ago|reply
Well, an interesting property of the brain is that any I/O relation happens within X milliseconds, which puts a limit on the depth of the network (if the speed of a neuron is limited). It would be nice to have some hard numbers on this.
[+] sharemywin|9 years ago|reply
general intelligence seems way over rated to me...

I have plenty of wants and desires that could take a whole army of idiot savants working 24/7 to fulfill.

[+] hmate9|9 years ago|reply
Out of curiosity: how far away do you think we are?
[+] iverjo|9 years ago|reply
How does this relate to Progressive Neural Networks [0]? That technique is also about accumulating knowledge (while not forgetting existing knowledge)

[0] https://arxiv.org/abs/1606.04671

[+] joantune|9 years ago|reply
It never ceases to amaze me that the best steps towards achieving AI is to look at how we perceive that a Neuron works and simulate it.

And the thing is, we aren't exactly sure why exactly that is.. it's amazing.

Sometimes the best thing we can do is imitate nature

[+] throwaway287391|9 years ago|reply
I completely blame my own community, rather than you, for writing this, but as an AI researcher, your comment is terribly painful to read. We have little to no idea how actual neurons (let alone entire brains) really work. The things that are often called "(artificial) neural networks" really shouldn't be called that. I strongly prefer terms like "computational networks" or (where applicable) "recurrent/convolutional networks".
[+] spott|9 years ago|reply
This isn't strictly true though.

Spiking Neural Networks [0] attempt to be more accurate representations of human neurons, but haven't really caught on because they aren't really much better than our perceptron model of neurons, at least for the things we are trying to do with them.

[0]http://www.ane.pl/pdf/7146.pdf

[+] partycoder|9 years ago|reply
Well, neurons have many properties... their information processing capabilities are one aspect, but they also deal with the physical level of communication and staying healthy.

Neurons are also a family of cells, and are very diverse in shapes and functions. We tend to oversimplify our representation of neurons. There are simple neurons and then you have neurons like the Purkinje cell that are massive.

Neurons also rely on their counterparts, the glial cells, that are much less often mentioned.

I think because of this, it will be a while until we fully understand the role of each one of them.

[+] visarga|9 years ago|reply
We're not simulating brain neurons:

- real neurons are stochastic and communicate through spikes, artificial neurons can communicate real values efficiently

- real neurons are more like automatons, they have a dynamic in time, learning happens as a continuous interaction with only its neighbors; artificial neurons are "static" (use discrete time) and implemented by forward and backward pass, and also can use nonlocal information

- real neurons can't backpropagate, because backprop requires the transmission of gradients back the same connections, but in reverse - brain connections don't support that kind of bidirectional data flow; artificial neurons work best by backprop

- real neurons can't implement convolutions, it would require a neuron to slide over a field; also real neurons can't implement RNNs as they are, and don't use backpropagation through time BPTT

So, artificial neurons are much less hampered and can do many things that real neurons can't do or have to use some less efficient method. That means brain neurons still have some tricks up their sleeve. Artificial neurons are quite different from brain neurons, and it's right to be so, because they can be more efficient that way.

[+] spynxic|9 years ago|reply
I find that somewhat strange.

Why attribute the idea of introducing new nodes to a graph to biological concepts? It seems like a simple step in exploration, similar to how one might think to vary the weights of the nodes randomly over some range.. unless there is some technique biology uses to pre-configure the nodes upon introduction to the network, that might be rather interesting.

[+] m3kw9|9 years ago|reply
Very nice coin term for just another deep learning method
[+] hmate9|9 years ago|reply
Slightly off topic, but I hate how publications are written. It seems like authors are purposely using big words and sentences that are often 5-6 lines long in order to make it seem more clever.

I find myself often having to reread a sentence in order to understand it.

These algorithms are often very simple and can be easily explained. Don't over complicate them.

[+] bicubic|9 years ago|reply
A lot of the time the verbosity isn't so much to sound more clever as it is to be very specific and explicit about what the author is trying to convey. There's a lot of changing assumed knowledge and jargon in various fields and our use of language changes over time. The publication writing style is an attempt to factor that out.
[+] amelius|9 years ago|reply
Then here's a challenge: could you write the abstract of the article in "simple English", without changing the meaning?
[+] shmageggy|9 years ago|reply
Although I just glanced at a few parts of this, I did not find it to be poorly written. Can you give an example where you thought it was too verbose or unnecessarily complex?
[+] paulsutter|9 years ago|reply
TL/DR:

- "We specifically consider the case of...a stacked deep autoencoder (AE), which is a type of neural network designed to encode a set of data samples such that they can be decoded to produce data sample reconstructions with minimal error

- "The first step of the NDL algorithm occurs when a set of new data points fail to be appropriately reconstructed by the trained network...When a data sample’s RE is too high, the assumption is that the AE level under examination does not contain a rich enough set of features to accurately reconstruct the sample.

- "The second step of the NDL algorithm is adding and training a new node, which occurs when a critical number of input data samples (outliers) fail to achieve adequate representation at some level of the network.

- "The final step of the NDL algorithm is intended to stabilize the network’s previous representations in the presence of newly added nodes. It involves training all the nodes in a level with both new data and replayed samples from previously seen classes on which the network has been trained.