As someone working on a reinforcement learning/neuroevolution problem right now, I find this to be extremely exciting. Fewer parameters, ceteris paribus, is always better—the fact that the experiments in this paper were run on one workstation, rather than on a massive farm of TPUs à la AlphaGo, implies quicker development iteration time and more accessibility to the average researcher.
The staging of components in this paper (compressor/controller), where neuroevolution is only applied to a low-dimensional controller, reminds me of Ha and Schmidhuber's recent paper on world models (which is briefly cited) [1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with ~1.7M parameters, and a final controller with just 1,088 parameters! Though it's recently been shown that neuroevolution can scale to millions of parameters [2], the technique of applying evolution to as few parameters as possible and supplementing with either autoencoders or vector quantization seems to be gaining traction. I hope to apply some of the ideas in this paper to multiple co-evolving agents...
Author here. The idea is low-hanging indeed, several friends (including @togelius!) commented "I always wanted to do that -- eventually". Realization is another matter. Have a look at the mess necessary to make it work: we had to discard UL initialization for online learning, accept that the encoding would grow in size, adapt the network sensibly to these changes, and tweak the ES to account for the extra weights.
I have been wolfing down RL articles, videos and publications after a intro to deep learning via Manning's Deep Learning for some time now and while the overall concept of RL is easy to grasp (agents, actions and state etc) some of the finer details and processes are quite confusing.
I am tempted to blame inconsistency across terminology and implementations for this lack of understanding but I suspect it has more to do with approaching this field through the lens of a developer and not a researcher or academic. Trying to understand the code without grasping the "science" of the mechanisms completely.
Edit: My point that I forgot to mention was that I always feel like I am playing catch-up to understand what is going on half the time as the amount of new content being released exceeds what I can absorb.
Uhm maybe I should have pointed this out earlier but the algorithms implementation can be found (independent of deep neuroevolution) in my Ruby machine learning workbench repo (in turn imported in DNE):
https://github.com/giuse/machine_learning_workbench
Okay, a true story, my second disappointing interaction with Atari marketing.
One fine day my boss came to me and said that he had an ask from Atari Marketing (in the Home Computer arm of the company).
The marketing drone came to my office (yes, we had offices in those days). "My idea is to pre-copyright all possible 8x8 bitmaps so that people can't use them without our permission. Can you print them out for me so we can submit them to the copyright office?" He actually meant all possible 8x8 bitmaps containing five colors, with colors chosen from an 7 or 8 bit space (I forget which).
I told him the story of the guy who supposedly invented chess, and was offered a choice of reward by his king. The fellow simply asked, "Just give me one grain of rice for the first square, two grains of rice for the second, four for the third, and so on." Most of you know how this ends, it's grade school math.
I explained to the marketing guy that the printout would probably outweigh the planet, maybe the solar system, maybe the galaxy. He went away, a little disgusted with those pesky engineers. (I don't know if he was the same oxygen waster who wanted me to write a 16K cartridge in just a couple of weeks, but he certainly was in the same department).
So I'm still sticking with three brain cells, despite all the downvotes :-)
OK, you're so grayed out but your bio says you've been programming since '79 and you've written games for Atari. So perhaps all we need is some elaboration? They seem like a successful company, don't they?
[+] [-] pjrule|7 years ago|reply
The staging of components in this paper (compressor/controller), where neuroevolution is only applied to a low-dimensional controller, reminds me of Ha and Schmidhuber's recent paper on world models (which is briefly cited) [1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with ~1.7M parameters, and a final controller with just 1,088 parameters! Though it's recently been shown that neuroevolution can scale to millions of parameters [2], the technique of applying evolution to as few parameters as possible and supplementing with either autoencoders or vector quantization seems to be gaining traction. I hope to apply some of the ideas in this paper to multiple co-evolving agents...
[1]. https://worldmodels.github.io
[2]. https://arxiv.org/abs/1712.06567
[+] [-] giuse|7 years ago|reply
[+] [-] spewilly|7 years ago|reply
care to elaborate?
[+] [-] kthejoker2|7 years ago|reply
"To the best of our knowledge, the only prior work using unsupervised learning as a pre-processor for neuroevolution is (cite)."
Just amazing how much low-hanging fruit there still is in the space.
[+] [-] giuse|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] markatkinson|7 years ago|reply
I am tempted to blame inconsistency across terminology and implementations for this lack of understanding but I suspect it has more to do with approaching this field through the lens of a developer and not a researcher or academic. Trying to understand the code without grasping the "science" of the mechanisms completely.
Either way if you feel to be in a similar spot check out this resource: https://reinforce.io and their respective Github repo: https://github.com/reinforceio/tensorforce.
Just reading through their code, and documentation has made a lot of the concepts clearer.
And a few more resources I found really helpful: http://karpathy.github.io/2016/05/31/rl/ https://www.analyticsvidhya.com/blog/2017/01/introduction-to... https://www.oreilly.com/ideas/reinforcement-learning-with-te...
Edit: My point that I forgot to mention was that I always feel like I am playing catch-up to understand what is going on half the time as the amount of new content being released exceeds what I can absorb.
[+] [-] kthejoker2|7 years ago|reply
https://github.com/giuse/DNE/tree/nips2018
[+] [-] giuse|7 years ago|reply
[+] [-] giuse|7 years ago|reply
[+] [-] kabdib|7 years ago|reply
[+] [-] kabdib|7 years ago|reply
One fine day my boss came to me and said that he had an ask from Atari Marketing (in the Home Computer arm of the company).
The marketing drone came to my office (yes, we had offices in those days). "My idea is to pre-copyright all possible 8x8 bitmaps so that people can't use them without our permission. Can you print them out for me so we can submit them to the copyright office?" He actually meant all possible 8x8 bitmaps containing five colors, with colors chosen from an 7 or 8 bit space (I forget which).
I told him the story of the guy who supposedly invented chess, and was offered a choice of reward by his king. The fellow simply asked, "Just give me one grain of rice for the first square, two grains of rice for the second, four for the third, and so on." Most of you know how this ends, it's grade school math.
I explained to the marketing guy that the printout would probably outweigh the planet, maybe the solar system, maybe the galaxy. He went away, a little disgusted with those pesky engineers. (I don't know if he was the same oxygen waster who wanted me to write a 16K cartridge in just a couple of weeks, but he certainly was in the same department).
So I'm still sticking with three brain cells, despite all the downvotes :-)
[+] [-] comboy|7 years ago|reply
[+] [-] anjc|7 years ago|reply
[+] [-] coldseattle|7 years ago|reply
[+] [-] a_t48|7 years ago|reply
[+] [-] grawprog|7 years ago|reply
[deleted]