Beating the World’s Best at Super Smash Bros. with Deep Reinforcement Learning

[+] gwern|9 years ago|reply

Note: it doesn't learn from pixels but features directly from RAM; and superhuman reaction time, with performance badly degrading when human-like delays added.

Good discussions on Reddit: https://www.reddit.com/r/MachineLearning/comments/5vh4ae/r_a... https://www.reddit.com/r/smashbros/comments/5vin8x/beating_t...

[+] stcredzero|9 years ago|reply

I could see this technology used for the bootstrapping of highly emergent MMO game worlds. It could be used to populate a world with fake "player" NPCs that are actually part of a simulated online ecosystem. Give the NPCs a large enough population, such that players cannot exert significant selection pressure, but give the NPCs real selection pressure through interaction with artificial life evolved with Genetic Algorithms. The rate of evolution of the a-life and the NPCs could be tuned to provide a comfortable rate of change for the human players, and the NPCs would insulate the players from the frustrations GAs might cause.

[+] SerLava|9 years ago|reply

This reminds me of Starcraft AI experiments. They can't actually make the computer smart, so they just jam 2000 button presses per second down the tube, giving every single unit its own simultaneous AI, and it out micromanages anyone.

With Marines usually.

[+] modeless|9 years ago|reply

I was similarly disappointed when I read this, but upon further reflection I still like this paper. It is very plausible that both of these problems could be fixed, it would just take a lot more time/power to train, and the resulting system would likely not run in real time making it impossible to test against real humans.

Further advancement in this area will require huge leaps in hardware performance. Luckily in the next few years I expect that the pace of improvement in specialized hardware for neural nets will far outpace Moore's Law.

[+] willwhitney|9 years ago|reply

Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem.

Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.

[+] revelation|9 years ago|reply

we instead use features read from the game’s memory on each frame, consisting of each player’s position, velocity, and action state, along with several other values

So it's cheating, presumably knowing the opponents action before the animation even starts to play.

[+] brilee|9 years ago|reply

Video of the AI here, playing as the black captain falcon: https://www.youtube.com/watch?v=dXJUlqBsZtE

[+] unknown|9 years ago|reply

[deleted]

[+] swanson|9 years ago|reply

We all know that Mew2King is first reinforcement learning AI capable of beating Super Smash Bros pro players.

https://www.youtube.com/watch?v=z-1YfhUFtbY&feature=youtu.be...

[+] forgotmysn|9 years ago|reply

and he still can't beat Armada

[+] jwtadvice|9 years ago|reply

While the AI might be cheating by taking salient features from RAM rather than from pixel values, this is still an incredible feat. Just a few years ago we did not have generic algorithms that could take even salient features and self-learn policies to near this level this quickly.

[+] willwhitney|9 years ago|reply

Yup, it's definitely an advantage to get all the correct values from the game state. But not as much as you might think; the vision portion of a DQN or similar trains quite quickly.

Plus, our bot doesn't have any clue about projectiles. We don't know where they live in memory, so the network doesn't get to know about them at all.

[+] smaili|9 years ago|reply

As someone who's played for quite a while I can tell you SSBM is one of the most complex games I've ever come across.

[+] jensv|9 years ago|reply

Why do you think the game is complex? Fairly simple game with low barrier to entry which is great when you invite guests over for games. Super Simple Button Mash!

[+] HeavenBanned|9 years ago|reply

[deleted]

[+] lanius|9 years ago|reply

I'm impressed it beat the likes of S2J and Zhu. I wonder how it'd fare against the Five Gods?

[+] WhitneyLand|9 years ago|reply

What's the key insight here compared to previous systems?. As far as I can tell, still no one can beat simple non-deterministic games that require some planning.

My favorite example is Ms. Pac Man because it seems so old and simplistic. Been tried by a dozen teams and no one can beat a decent human.

55 comments