"Previous attempts to combine RL with neural networks had largely failed due to unstable learning. To address these instabilities, our Deep Q-Networks (DQN) algorithm stores all of the agent's experiences and then randomly samples and replays these experiences to provide diverse and decorrelated training data."
The comparison to dreaming reminds me of a comment in Information Theory, Inference, and Learning Algorithms:
"One way of viewing the two terms in the gradient (43.9) is as 'waking' and 'sleeping' rules. While the network is 'awake', it measures the correlation between x_i and x_j in the real world, and weights are increased in proportion. While the network is 'asleep', it 'dreams' about the world using the generative model (43.4), and measures the correlations between x_i and x_j in the model world; these correlations determine a proportional decrease in the weights. If the second-order correlations in the dream world match the correlations in the real world, then the two terms balance and the weights do not change."
When it's playing a game (e.g. breakout) and it's being fed the pixels on the screen, how is the AI being told what the score/progress is? Does it have access to some numeric metric that is chosen by the researchers for each game?
Yes, it's fed the score as the reward value used. If I'm not wrong, they didn't normalise it across games for the initial paper but normalised it to some range for some of the following research experiments.
This would be really interesting to see. I'm curious if Google would avoid this game though since that the media would likely have a field day reporting about a murderous Google developed AI if they did this. Technically we already have this in games anyway though (e.g. bots in FPS games).
[+] [-] awwaiid|9 years ago|reply
... so, made the machines dream. Fancy!
[+] [-] cosmoharrigan|9 years ago|reply
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
http://link.springer.com/article/10.1023/A:1022628806385
as well as his excellent 1993 PhD thesis:
Reinforcement Learning for Robots Using Neural Networks
http://www.dtic.mil/dtic/tr/fulltext/u2/a261434.pdf
[+] [-] ehsanu1|9 years ago|reply
[+] [-] jamessb|9 years ago|reply
"One way of viewing the two terms in the gradient (43.9) is as 'waking' and 'sleeping' rules. While the network is 'awake', it measures the correlation between x_i and x_j in the real world, and weights are increased in proportion. While the network is 'asleep', it 'dreams' about the world using the generative model (43.4), and measures the correlations between x_i and x_j in the model world; these correlations determine a proportional decrease in the weights. If the second-order correlations in the dream world match the correlations in the real world, then the two terms balance and the weights do not change."
[+] [-] syngrog66|9 years ago|reply
[+] [-] mtgx|9 years ago|reply
[+] [-] seanwilson|9 years ago|reply
[+] [-] sanxiyn|9 years ago|reply
For example, Breakout saves score in address 76 and 77. Arcade Learning Environment has code to read the score, one per game. Code for Breakout is here: https://github.com/mgbellemare/Arcade-Learning-Environment/b...
[+] [-] msohcw|9 years ago|reply
[+] [-] Dolores12|9 years ago|reply
[+] [-] tintor|9 years ago|reply
[+] [-] shogunmike|9 years ago|reply
[+] [-] seanwilson|9 years ago|reply
This would be really interesting to see. I'm curious if Google would avoid this game though since that the media would likely have a field day reporting about a murderous Google developed AI if they did this. Technically we already have this in games anyway though (e.g. bots in FPS games).