Note: it doesn't learn from pixels but features directly from RAM; and superhuman reaction time, with performance badly degrading when human-like delays added.
I could see this technology used for the bootstrapping of highly emergent MMO game worlds. It could be used to populate a world with fake "player" NPCs that are actually part of a simulated online ecosystem. Give the NPCs a large enough population, such that players cannot exert significant selection pressure, but give the NPCs real selection pressure through interaction with artificial life evolved with Genetic Algorithms. The rate of evolution of the a-life and the NPCs could be tuned to provide a comfortable rate of change for the human players, and the NPCs would insulate the players from the frustrations GAs might cause.
This reminds me of Starcraft AI experiments. They can't actually make the computer smart, so they just jam 2000 button presses per second down the tube, giving every single unit its own simultaneous AI, and it out micromanages anyone.
I was similarly disappointed when I read this, but upon further reflection I still like this paper. It is very plausible that both of these problems could be fixed, it would just take a lot more time/power to train, and the resulting system would likely not run in real time making it impossible to test against real humans.
Further advancement in this area will require huge leaps in hardware performance. Luckily in the next few years I expect that the pace of improvement in specialized hardware for neural nets will far outpace Moore's Law.
Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem.
Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.
we instead use features read from the game’s memory
on each frame, consisting of each player’s position, velocity,
and action state, along with several other values
So it's cheating, presumably knowing the opponents action before the animation even starts to play.
While the AI might be cheating by taking salient features from RAM rather than from pixel values, this is still an incredible feat. Just a few years ago we did not have generic algorithms that could take even salient features and self-learn policies to near this level this quickly.
Yup, it's definitely an advantage to get all the correct values from the game state. But not as much as you might think; the vision portion of a DQN or similar trains quite quickly.
Plus, our bot doesn't have any clue about projectiles. We don't know where they live in memory, so the network doesn't get to know about them at all.
Why do you think the game is complex? Fairly simple game with low barrier to entry which is great when you invite guests over for games. Super Simple Button Mash!
What's the key insight here compared to previous systems?. As far as I can tell, still no one can beat simple non-deterministic games that require some planning.
My favorite example is Ms. Pac Man because it seems so old and simplistic. Been tried by a dozen teams and no one can beat a decent human.
[+] [-] gwern|9 years ago|reply
Good discussions on Reddit: https://www.reddit.com/r/MachineLearning/comments/5vh4ae/r_a... https://www.reddit.com/r/smashbros/comments/5vin8x/beating_t...
[+] [-] stcredzero|9 years ago|reply
[+] [-] SerLava|9 years ago|reply
With Marines usually.
[+] [-] modeless|9 years ago|reply
Further advancement in this area will require huge leaps in hardware performance. Luckily in the next few years I expect that the pace of improvement in specialized hardware for neural nets will far outpace Moore's Law.
[+] [-] willwhitney|9 years ago|reply
Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.
[+] [-] revelation|9 years ago|reply
So it's cheating, presumably knowing the opponents action before the animation even starts to play.
[+] [-] brilee|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] swanson|9 years ago|reply
https://www.youtube.com/watch?v=z-1YfhUFtbY&feature=youtu.be...
[+] [-] forgotmysn|9 years ago|reply
[+] [-] jwtadvice|9 years ago|reply
[+] [-] willwhitney|9 years ago|reply
Plus, our bot doesn't have any clue about projectiles. We don't know where they live in memory, so the network doesn't get to know about them at all.
[+] [-] smaili|9 years ago|reply
[+] [-] jensv|9 years ago|reply
[+] [-] HeavenBanned|9 years ago|reply
[deleted]
[+] [-] lanius|9 years ago|reply
[+] [-] WhitneyLand|9 years ago|reply
My favorite example is Ms. Pac Man because it seems so old and simplistic. Been tried by a dozen teams and no one can beat a decent human.
[+] [-] cerved|9 years ago|reply
[+] [-] fiatjaf|9 years ago|reply
[+] [-] HeavenBanned|9 years ago|reply
[deleted]