They simulate neurogenesis, I guess, but they do not incorporate the most interesting part of that neurogenesis: That is the new neurons are born into the dentate gyrus, a region thought to have a particular capacity to orthoganalize feature representations that are similar (e.g. pattern separate) allowing distinct memories to be formed for similar events. The dentate gyrus outputs to a region called Cornu ammonis 3 (CA3) which is heavily recurrent, and thought to br able to pattern complete a full representation from partial inputs. That is, CA3 can encode and retrieve the relations between 2 or more features or objects.
For a mathematical model and review one might read:
Rolls (2013)
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812781/
but many others exist. I'd write more but typing in my phone is driving me to distraction.
this is really interesting, where/how did you learn this?
I'd like to learn more about these things - brain regions, connections, functions - and what they might imply about the kinds of computations that are going on, but my background is mainly on the AI/math side of things.
Modern applications of small networks regularly reduce sizes from larger state-of-the-art networks using distillation. Distillation compacts neural networks while affecting accuracy minimally.
Instead of pruning directly from the large network, just learn how it generalizes. Takes fewer nodes / overall operations (Multiplications / Additions).
This idea is at least partially in use with regularisation and dropout. The difference at least with dropout is that the "killed" neurons are then massaged back into the network in order become useful again.
Looks like these researchers are trying to make a network more adaptive, I think that deleting nodes would only make them worse at the current task they're being trained on as well as worse on the tasks they're being adapted to.
You could train a model using neurogenesis to increase its accuracy, and then use distillation to train a smaller network to comparable accuracy.
But these are two very different, but complementary, problems.
As others have mentioned, there are approaches like regularisation and dropout which try to do similar things. What I find interesting is the fact there are two reasons to do this: to generalise/avoid-overfitting and to reduce resource usage.
It seems like almost all effort is spent on the former, since everyone's aiming for higher accuracy numbers. Are there any widely-used methods to tackle the latter?
For example, I'm imagining a system which is either given measurements of its resource usage (time, memory, etc.) or uses some simple predictive model (e.g. time ~ number of layers * some constant), and works within some resource bound:
- If we're below the bound, expand the model (add neurons, etc.) to allow accuracy increases (note "allow": it's ok to ignore/regularise-to-zero the extra parameters to avoid overfitting)
- If we're above the bound, prune the model (in a way which tries to preserve accuracy)
- Allocate resources to optimise some objective, e.g. reduce variance by pruning the parameters of the best-performing class/predictor/etc. and using those resources to expand the worst performer.
The closest thing I know of are artificial economies, but they seem to be more like a selection mechanism (akin to genetic programming) than a direct optimisation procedure (like gradient descent on an ANN).
For that, check out our OpenReview ICLR submission on
NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD, by
Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
https://openreview.net/revisions?id=HyecJGP5ge
To add on to this - they "specifically consider the case
of adding new nodes to pre-train a stacked deep autoencoder", by basically keeping track of when certain layers cannot reproduce their input and then adding more nodes+retraining with both new (not reproduced) and old data. It is quite intuitive, basically the most naive and obvious first attempt at the problem (not meant in a condescending way, just want to point out it's not that generalizable and is pretty ad-hoc).
Sorry if I'm being snobbish, but I do wonder why this paper is only being submitted to IJCNN, a 2nd tier machine learning conference. I know students who publish undergrad research at workshops with lower acceptance rates than IJCNN. I can't think of any important machine learning papers published in IJCNN in the recent past.
It depends on what conclusions you're trying to draw from that information. What conference a paper was accepted to is a second-order signal of the noteworthiness. It's probably easier for someone versed in the field to just read the paper to determine if it's interesting. If you're using the conference as a quick pass/fail as you skim through the abstracts of hundreds of papers, ok, but you probably wouldn't make time to comment on HN about it in that case.
This paper looks like it builds on pretty well-known techniques like stacked autoencoders, so let's see what first-order noteworthiness data we can gather from a quick skim of the paper. If I had to guess why it wasn't accepted into a better conference:
- It uses stacked autoencoders, which are pretty out of fashion
- It bothers reporting results on MNIST
- (more subjectively) It pulls an unfortunately common technique of saying "here's something the brain does" and then hand-waving that it's a deep reason why a technique they've come up with is useful, when in fact the relationship is just "inspired by the general idea of", not "performs the same function as" the biological mechanism. In this case, I think the tenuous connection of their technique to research on neurogenesis is pretty flimsy. Clearly neurogenesis is not how an adult human brain forms new memories or gains proficiency in new skills (which they acknowledge in the conclusion)
A good point was made that a model of neurogenesis must also incorporate neuronal death besides neuronal birth (since hippocampus and the brain as a whole have physical constraints, you can't keep growing your network infinitely :). That's why any model of neurogenesis must incorporate interplay between birth and death of new (and old) neurons; that's was the main idea of the paper I mentioned in an earlier post (this year ICLR submission https://openreview.net/forum?id=HyecJGP5ge)
Note that just adding nodes to networks was proposed before, eg. the classical work on cascade correlations.
I think what we might see is a kind of autonomous corporation that is nominally under the control of shareholders, a CEO or a board, but which makes decisions without very much or any human input, and which gains some amount of legal rights through corporate personhood.
It won't be a 'general ai', though. More like a set of loosely connected systems that operate 'in the best interests of the shareholders', however that's defined.
It's pretty much the end state of the trend of pushing decision making to algorithms to remove moral and legal culpability from individuals.
Well, an interesting property of the brain is that any I/O relation happens within X milliseconds, which puts a limit on the depth of the network (if the speed of a neuron is limited). It would be nice to have some hard numbers on this.
I completely blame my own community, rather than you, for writing this, but as an AI researcher, your comment is terribly painful to read. We have little to no idea how actual neurons (let alone entire brains) really work. The things that are often called "(artificial) neural networks" really shouldn't be called that. I strongly prefer terms like "computational networks" or (where applicable) "recurrent/convolutional networks".
Spiking Neural Networks [0] attempt to be more accurate representations of human neurons, but haven't really caught on because they aren't really much better than our perceptron model of neurons, at least for the things we are trying to do with them.
Well, neurons have many properties... their information processing capabilities are one aspect, but they also deal with the physical level of communication and staying healthy.
Neurons are also a family of cells, and are very diverse in shapes and functions. We tend to oversimplify our representation of neurons. There are simple neurons and then you have neurons like the Purkinje cell that are massive.
Neurons also rely on their counterparts, the glial cells, that are much less often mentioned.
I think because of this, it will be a while until we fully understand the role of each one of them.
- real neurons are stochastic and communicate through spikes, artificial neurons can communicate real values efficiently
- real neurons are more like automatons, they have a dynamic in time, learning happens as a continuous interaction with only its neighbors; artificial neurons are "static" (use discrete time) and implemented by forward and backward pass, and also can use nonlocal information
- real neurons can't backpropagate, because backprop requires the transmission of gradients back the same connections, but in reverse - brain connections don't support that kind of bidirectional data flow; artificial neurons work best by backprop
- real neurons can't implement convolutions, it would require a neuron to slide over a field; also real neurons can't implement RNNs as they are, and don't use backpropagation through time BPTT
So, artificial neurons are much less hampered and can do many things that real neurons can't do or have to use some less efficient method. That means brain neurons still have some tricks up their sleeve. Artificial neurons are quite different from brain neurons, and it's right to be so, because they can be more efficient that way.
Why attribute the idea of introducing new nodes to a graph to biological concepts? It seems like a simple step in exploration, similar to how one might think to vary the weights of the nodes randomly over some range.. unless there is some technique biology uses to pre-configure the nodes upon introduction to the network, that might be rather interesting.
Slightly off topic, but I hate how publications are written. It seems like authors are purposely using big words and sentences that are often 5-6 lines long in order to make it seem more clever.
I find myself often having to reread a sentence in order to understand it.
These algorithms are often very simple and can be easily explained. Don't over complicate them.
A lot of the time the verbosity isn't so much to sound more clever as it is to be very specific and explicit about what the author is trying to convey. There's a lot of changing assumed knowledge and jargon in various fields and our use of language changes over time. The publication writing style is an attempt to factor that out.
Although I just glanced at a few parts of this, I did not find it to be poorly written. Can you give an example where you thought it was too verbose or unnecessarily complex?
- "We specifically consider the case of...a stacked deep autoencoder (AE), which is a type of neural network designed to encode a set of data samples such that they can be decoded to produce data sample reconstructions with minimal error
- "The first step of the NDL algorithm occurs when a set of new data points fail to be appropriately reconstructed by the trained network...When a data sample’s RE is too high, the assumption is that the AE level under examination does not contain a rich enough set of features to accurately reconstruct the sample.
- "The second step of the NDL algorithm is adding and training a new node, which occurs when a critical number of input data samples (outliers) fail to achieve adequate representation at some level of the network.
- "The final step of the NDL algorithm is intended to stabilize the network’s previous representations in the presence of newly added nodes. It involves training all the nodes in a level with both new data and replayed samples from previously seen classes on which the network has been trained.
[+] [-] SubiculumCode|9 years ago|reply
but many others exist. I'd write more but typing in my phone is driving me to distraction.
[+] [-] arkymark|9 years ago|reply
I'd like to learn more about these things - brain regions, connections, functions - and what they might imply about the kinds of computations that are going on, but my background is mainly on the AI/math side of things.
[+] [-] jostmey|9 years ago|reply
[+] [-] cing|9 years ago|reply
[+] [-] kajecounterhack|9 years ago|reply
Modern applications of small networks regularly reduce sizes from larger state-of-the-art networks using distillation. Distillation compacts neural networks while affecting accuracy minimally.
Instead of pruning directly from the large network, just learn how it generalizes. Takes fewer nodes / overall operations (Multiplications / Additions).
[+] [-] antome|9 years ago|reply
[+] [-] 10b5-1|9 years ago|reply
You could train a model using neurogenesis to increase its accuracy, and then use distillation to train a smaller network to comparable accuracy.
But these are two very different, but complementary, problems.
[+] [-] chriswarbo|9 years ago|reply
It seems like almost all effort is spent on the former, since everyone's aiming for higher accuracy numbers. Are there any widely-used methods to tackle the latter?
For example, I'm imagining a system which is either given measurements of its resource usage (time, memory, etc.) or uses some simple predictive model (e.g. time ~ number of layers * some constant), and works within some resource bound:
- If we're below the bound, expand the model (add neurons, etc.) to allow accuracy increases (note "allow": it's ok to ignore/regularise-to-zero the extra parameters to avoid overfitting)
- If we're above the bound, prune the model (in a way which tries to preserve accuracy)
- Allocate resources to optimise some objective, e.g. reduce variance by pruning the parameters of the best-performing class/predictor/etc. and using those resources to expand the worst performer.
The closest thing I know of are artificial economies, but they seem to be more like a selection mechanism (akin to genetic programming) than a direct optimisation procedure (like gradient descent on an ANN).
[+] [-] irinarish|9 years ago|reply
For that, check out our OpenReview ICLR submission on NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD, by Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano https://openreview.net/revisions?id=HyecJGP5ge
[+] [-] hyperbovine|9 years ago|reply
[+] [-] groar|9 years ago|reply
[+] [-] andreyk|9 years ago|reply
[+] [-] argonaut|9 years ago|reply
[+] [-] habitue|9 years ago|reply
This paper looks like it builds on pretty well-known techniques like stacked autoencoders, so let's see what first-order noteworthiness data we can gather from a quick skim of the paper. If I had to guess why it wasn't accepted into a better conference:
- It uses stacked autoencoders, which are pretty out of fashion
- It bothers reporting results on MNIST
- (more subjectively) It pulls an unfortunately common technique of saying "here's something the brain does" and then hand-waving that it's a deep reason why a technique they've come up with is useful, when in fact the relationship is just "inspired by the general idea of", not "performs the same function as" the biological mechanism. In this case, I think the tenuous connection of their technique to research on neurogenesis is pretty flimsy. Clearly neurogenesis is not how an adult human brain forms new memories or gains proficiency in new skills (which they acknowledge in the conclusion)
[+] [-] eruditely|9 years ago|reply
[+] [-] irinarish|9 years ago|reply
[+] [-] irinarish|9 years ago|reply
https://openreview.net/revisions?id=HyecJGP5ge
NEUROGENESIS-INSPIRED DICTIONARY LEARNING: ONLINE MODEL ADAPTION IN A CHANGING WORLD Sahil Garg, Irina Rish, Guillermo Cecchi, Aurelie Lozano
[+] [-] gallerdude|9 years ago|reply
[+] [-] empath75|9 years ago|reply
It won't be a 'general ai', though. More like a set of loosely connected systems that operate 'in the best interests of the shareholders', however that's defined.
It's pretty much the end state of the trend of pushing decision making to algorithms to remove moral and legal culpability from individuals.
[+] [-] amelius|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] sharemywin|9 years ago|reply
I have plenty of wants and desires that could take a whole army of idiot savants working 24/7 to fulfill.
[+] [-] hmate9|9 years ago|reply
[+] [-] iverjo|9 years ago|reply
[0] https://arxiv.org/abs/1606.04671
[+] [-] joantune|9 years ago|reply
And the thing is, we aren't exactly sure why exactly that is.. it's amazing.
Sometimes the best thing we can do is imitate nature
[+] [-] throwaway287391|9 years ago|reply
[+] [-] spott|9 years ago|reply
Spiking Neural Networks [0] attempt to be more accurate representations of human neurons, but haven't really caught on because they aren't really much better than our perceptron model of neurons, at least for the things we are trying to do with them.
[0]http://www.ane.pl/pdf/7146.pdf
[+] [-] partycoder|9 years ago|reply
Neurons are also a family of cells, and are very diverse in shapes and functions. We tend to oversimplify our representation of neurons. There are simple neurons and then you have neurons like the Purkinje cell that are massive.
Neurons also rely on their counterparts, the glial cells, that are much less often mentioned.
I think because of this, it will be a while until we fully understand the role of each one of them.
[+] [-] visarga|9 years ago|reply
- real neurons are stochastic and communicate through spikes, artificial neurons can communicate real values efficiently
- real neurons are more like automatons, they have a dynamic in time, learning happens as a continuous interaction with only its neighbors; artificial neurons are "static" (use discrete time) and implemented by forward and backward pass, and also can use nonlocal information
- real neurons can't backpropagate, because backprop requires the transmission of gradients back the same connections, but in reverse - brain connections don't support that kind of bidirectional data flow; artificial neurons work best by backprop
- real neurons can't implement convolutions, it would require a neuron to slide over a field; also real neurons can't implement RNNs as they are, and don't use backpropagation through time BPTT
So, artificial neurons are much less hampered and can do many things that real neurons can't do or have to use some less efficient method. That means brain neurons still have some tricks up their sleeve. Artificial neurons are quite different from brain neurons, and it's right to be so, because they can be more efficient that way.
[+] [-] spynxic|9 years ago|reply
Why attribute the idea of introducing new nodes to a graph to biological concepts? It seems like a simple step in exploration, similar to how one might think to vary the weights of the nodes randomly over some range.. unless there is some technique biology uses to pre-configure the nodes upon introduction to the network, that might be rather interesting.
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] m3kw9|9 years ago|reply
[+] [-] hmate9|9 years ago|reply
I find myself often having to reread a sentence in order to understand it.
These algorithms are often very simple and can be easily explained. Don't over complicate them.
[+] [-] bicubic|9 years ago|reply
[+] [-] amelius|9 years ago|reply
[+] [-] shmageggy|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] yahyaheee|9 years ago|reply
[+] [-] paulsutter|9 years ago|reply
- "We specifically consider the case of...a stacked deep autoencoder (AE), which is a type of neural network designed to encode a set of data samples such that they can be decoded to produce data sample reconstructions with minimal error
- "The first step of the NDL algorithm occurs when a set of new data points fail to be appropriately reconstructed by the trained network...When a data sample’s RE is too high, the assumption is that the AE level under examination does not contain a rich enough set of features to accurately reconstruct the sample.
- "The second step of the NDL algorithm is adding and training a new node, which occurs when a critical number of input data samples (outliers) fail to achieve adequate representation at some level of the network.
- "The final step of the NDL algorithm is intended to stabilize the network’s previous representations in the presence of newly added nodes. It involves training all the nodes in a level with both new data and replayed samples from previously seen classes on which the network has been trained.