top | item 11926396

Deep Reinforcement Learning

146 points| dcre | 9 years ago |deepmind.com | reply

29 comments

order
[+] awwaiid|9 years ago|reply
"Previous attempts to combine RL with neural networks had largely failed due to unstable learning. To address these instabilities, our Deep Q-Networks (DQN) algorithm stores all of the agent's experiences and then randomly samples and replays these experiences to provide diverse and decorrelated training data."

... so, made the machines dream. Fancy!

[+] ehsanu1|9 years ago|reply
Aha, so that's what (human) dreaming is for.
[+] jamessb|9 years ago|reply
The comparison to dreaming reminds me of a comment in Information Theory, Inference, and Learning Algorithms:

"One way of viewing the two terms in the gradient (43.9) is as 'waking' and 'sleeping' rules. While the network is 'awake', it measures the correlation between x_i and x_j in the real world, and weights are increased in proportion. While the network is 'asleep', it 'dreams' about the world using the generative model (43.4), and measures the correlations between x_i and x_j in the model world; these correlations determine a proportional decrease in the weights. If the second-order correlations in the dream world match the correlations in the real world, then the two terms balance and the weights do not change."

[+] syngrog66|9 years ago|reply
next up: make them dream of electric sheep
[+] mtgx|9 years ago|reply
Let's hope Google doesn't feed it video games like Battlefield, where it learns how to most effectively kill humans.
[+] seanwilson|9 years ago|reply
When it's playing a game (e.g. breakout) and it's being fed the pixels on the screen, how is the AI being told what the score/progress is? Does it have access to some numeric metric that is chosen by the researchers for each game?
[+] msohcw|9 years ago|reply
Yes, it's fed the score as the reward value used. If I'm not wrong, they didn't normalise it across games for the initial paper but normalised it to some range for some of the following research experiments.
[+] Dolores12|9 years ago|reply
They can measure time you played. More time in game mean higher score.
[+] tintor|9 years ago|reply
Labyrinth? I have a feeling that Doom is next.
[+] seanwilson|9 years ago|reply
> Labyrinth? I have a feeling that Doom is next.

This would be really interesting to see. I'm curious if Google would avoid this game though since that the media would likely have a field day reporting about a murderous Google developed AI if they did this. Technically we already have this in games anyway though (e.g. bots in FPS games).