top | item 11926396

Deep Reinforcement Learning

146 points| dcre | 9 years ago |deepmind.com | reply

29 comments

[+] awwaiid|9 years ago|reply

"Previous attempts to combine RL with neural networks had largely failed due to unstable learning. To address these instabilities, our Deep Q-Networks (DQN) algorithm stores all of the agent's experiences and then randomly samples and replays these experiences to provide diverse and decorrelated training data."

... so, made the machines dream. Fancy!

[+] cosmoharrigan|9 years ago|reply

For historical background on this part of the algorithm, called "experience replay", see this paper from Long-Ji Lin in 1992:

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

http://link.springer.com/article/10.1023/A:1022628806385

as well as his excellent 1993 PhD thesis:

Reinforcement Learning for Robots Using Neural Networks

http://www.dtic.mil/dtic/tr/fulltext/u2/a261434.pdf

[+] ehsanu1|9 years ago|reply

Aha, so that's what (human) dreaming is for.

[+] jamessb|9 years ago|reply

The comparison to dreaming reminds me of a comment in Information Theory, Inference, and Learning Algorithms:

"One way of viewing the two terms in the gradient (43.9) is as 'waking' and 'sleeping' rules. While the network is 'awake', it measures the correlation between x_i and x_j in the real world, and weights are increased in proportion. While the network is 'asleep', it 'dreams' about the world using the generative model (43.4), and measures the correlations between x_i and x_j in the model world; these correlations determine a proportional decrease in the weights. If the second-order correlations in the dream world match the correlations in the real world, then the two terms balance and the weights do not change."

[+] syngrog66|9 years ago|reply

next up: make them dream of electric sheep

[+] mtgx|9 years ago|reply

Let's hope Google doesn't feed it video games like Battlefield, where it learns how to most effectively kill humans.

[+] seanwilson|9 years ago|reply

When it's playing a game (e.g. breakout) and it's being fed the pixels on the screen, how is the AI being told what the score/progress is? Does it have access to some numeric metric that is chosen by the researchers for each game?

[+] sanxiyn|9 years ago|reply

Yes.

For example, Breakout saves score in address 76 and 77. Arcade Learning Environment has code to read the score, one per game. Code for Breakout is here: https://github.com/mgbellemare/Arcade-Learning-Environment/b...

[+] msohcw|9 years ago|reply

Yes, it's fed the score as the reward value used. If I'm not wrong, they didn't normalise it across games for the initial paper but normalised it to some range for some of the following research experiments.

[+] Dolores12|9 years ago|reply

They can measure time you played. More time in game mean higher score.

[+] tintor|9 years ago|reply

Labyrinth? I have a feeling that Doom is next.

[+] shogunmike|9 years ago|reply

You might find the VizDoom project interesting: http://vizdoom.cs.put.edu.pl/

[+] seanwilson|9 years ago|reply

> Labyrinth? I have a feeling that Doom is next.

This would be really interesting to see. I'm curious if Google would avoid this game though since that the media would likely have a field day reporting about a murderous Google developed AI if they did this. Technically we already have this in games anyway though (e.g. bots in FPS games).