This looks really interesting. It would be a good project to test this against a general card-playing framework to easily test it on a variety of imperfect-information games based on playing cards.
I tried my hand once or twice at (re-)implementing board games [0], so that I could run some common "AI" algorithms on the game trees.
What tripped me up every time is that most board games have a lot of "if this happens, there is this specific rule that applies". Even relatively simple games (like Homeworlds) are pretty hard to nail down perfectly due to all the special cases.
Do you, or somebody else, have any recommendations on how to handle this?
[0] Dominion, Homeworlds and the battle part of Eclipse iirc.
Thank you for posting! Maybe you can include the game of Arimaa [1]. Arimaa was designed to be hard(er) for computers and level the playing field for humans. Algorithms were developed eventually, though I have not kept up to know where that stands today.
Thanks for this! I'm currently designing a language for complex board games like Battlestar Galactica: http://www.adama-lang.org/
Something that I found amazing was inverting the flow control such that the server asks players questions with a list of possible choices simplifies the agent design tremendously. As I'm looking to retire to work on this project, I can generate the agent code and then hand-craft an AI. However, some AIs are soooo hard to even conceptualize.
Imperfect information games will always have a luck element that gives casual players an edge. That's basically the appeal of card games over board games.
This is clearly part of DeepMind's long-game plan to achieve world domination through board game mastery. Naming the new algorithm after the book is a real tip of their hand...
The abbreviation is PoG too. I bet that was totally on purpose. At least one person in Brain is a dota player, so you better believe they watch twitch.
Funny that most of the comments are about the name. What an excellent choice.
PSA: The "Culture" novels by Iain M Banks are fantastic and can be read in any order. "Player of Games" was the 1st one I read and still probably my favorite.
"In 2015, two SpaceX autonomous spaceport drone ships—Just Read the Instructions and Of Course I Still Love You—were named after ships in the book, as a posthumous tribute to Banks by Elon Musk"
Allusions are fun and all, but I disagree. These are important problems that a lot of people have put their whole careers into researching. Silly names like these lack gravitas.
I think it is also a reference to "PogChamp", although it's disappointing that PoG apparently wasn't evaluated against the Arcade Learning Environment (ALE) corpus of Atari 2600 games.
This is a great result, but you can see that it's more of a theoretical case because of this: "converging to perfect play as available computation time and approximation capacity increases." That is true for pretty much all current deep reinforcement learning algorithms.
The practical question is: How much computation do you need to get useful results? Alpha Go Zero is impressive mathematics, but who is willing to spend $1mio daily for months to train it? IMPALA (another Google one) can learn almost all Atari games, but you need a head node with 256 TPU cores and 1000+ evaluation workers to replicate the timings from the paper.
You often don't need anywhere near the amount of compute in these papers to get similar performance.
Suppose you're a business that needs to play games. Most people seem to think that it's a matter of plugging in the settings from the paper, buying the same hardware, then clicking a button and waiting.
It's not. The specific settings matter a lot.
But my main point is that you'll get most of your performance pretty rapidly. The only reason to leave it running for so long is to get that last N%, which is nice for benchmarks but not for business.
DeepMind overspends. Actually, they don't; they're not paying anywhere close to the price of a 256 core TPU. (Many external companies aren't, either, and you can get a good deal by negotiating with the Cloud TPU team.)
But you don't need a 256 core TPU. Lots of times, these algorithms simply do not require the amount of compute that people throw at the problem.
On the other hand, you can also usually get access to that kind of compute. A 256 core TPU isn't beyond reach. I'm pretty sure I could create one right now. It's free, thanks to TFRC, and you yourself can apply (and be approved). I was. https://sites.research.google/trc/
It kills me that it's so hard to replicate these papers, which is most of the motivation for my comment here. Ultimately, you're right: "How much compute?" is a big unknown. But the lower bound is much lower than most people realize (and most researchers).
> That is true for pretty much all current deep reinforcement learning algorithms.
Is that true? I was unaware that PPO, SAC, DQN, Impala, MuZero/AlphaZero etc would all automatically Just Work™ for hidden information games. Straight MCTS-inspired algorithms seem like they'd fail for reasons discussed in the paper, and while PPO/Impala work reasonably well in DoTA2/SC2, it's not obvious they'd converge to perfect play.
Comparing against Stockfish 8 in a paper released today and labeling it as "Stockfish" is bordering on being dishonest. The current stockfish version (14) would make AlphaZero look bad, so they don't include it ...
The name of the game here is generality. For a really general agent, they are looking to have superhuman performance, not get state of the art on every individual task. Beating stockfish 8 convinces me that it would be superhuman at chess.
The first mention says "Stockfish 8, level 20" in the paper. This isn't a blog post that you can skim, you need to read the whole thing before critiquing.
Isn't the point comparing traditional heuristic techniques against DNN-learned techniques? I understand the latest Stockfish is etching quite close to AlphaZero techniques, but maybe I am wrong.
The abstract clearly states that the best chess and Go bots are not beaten: "Player of Games reaches strong performance in chess
and Go, beats the strongest openly available agent in heads-up no-limit Texas hold’em poker
(Slumbot)..."
I think this is a good step forward that generalizes an algorithm to play both perfect and imperfect information games. However, table 9 shows (I believe it shows, it is not the most intuitive form), that other AIs (Deepstack, ReBeL, and Supremus) eat its lunch at poker. It also performs worse than AlphaZero at perfect information games. So, while a nice generalizing framework, probably will not be what you use in practice.
I didn't even know about the book until I read the comments here, I thought it was a reference to the Grimes song. Funny coincidence the song and the engine would appear so close in time to one another.
The Grimes song is a reference to the book too. She also has Marain subtitles in her video for "Idoru", which is the language used in The Culture. Weird mix of two author's (Idoru being William Gibson) works to be sure.
This seems like a significant milestone in AI. I mean what can't an agent with mastery of "guided search, learning, and game-theoretic reasoning" accomplish?
Anyone else surprised to see that Demis Hassabis didn't have a hand in this research? Given his background as a player of many games, and involvement in a lot of their research.
I'm more surprised David Silver isn't on it, since his background is in imperfect information games, with papers such as https://arxiv.org/abs/1603.01121
He did multiple poker papers before he was the main author of Alpha Zero.
I want to see deepmind make a bot to play team based first person shooters like csgo and rainbow6 siege, to stack up five of them against a team of professional players.
Honestly that probably won't be too interesting as (a) one AI could perfectly control several agents (ie perfect coordination of global strategies) and (b) an AI has low to no reaction times and perfect aim (aimbots already have that) so I would expect that would quickly result in a slaughterfest.
It would be awesome to have two interacting communities: AI experts building open source general game playing engines, and gaming fans writing pluggable rule specifications and UIs for popular games.
A bit of googling shows that there is a General Game Playing AI community with their own Game Description Language. I never really encountered them before, and the DeepMind paper does not cite them, either.
Fun fact: The consensus between professional go and chess players is that all new AI systems (alphago, etc) have really revitalised the game and introduced incredible amount of new strategies and depth.
You're getting downvotes but honestly I agree. Who cares about board games? We should've moved on from this once we "solved" chess and Go. There are more important things and it's not remotely surprising that a computer can beat a human when there's a simple, abstract optimization problem to throw computing power at. Make it creative...now that's a challenge worthy of the top AI talent.
Solving the game comes before solving for fun. If we create an AI that can win, then we can hamper the AI in fun ways, or give it an altered objective function that maximizes the players fun.
Yes indeed. AI research will only take a real step forward when it learns how to be creative instead of just very good at optimising simple formal systems like board games.
captn3m0|4 years ago
This looks really interesting. It would be a good project to test this against a general card-playing framework to easily test it on a variety of imperfect-information games based on playing cards.
fho|4 years ago
What tripped me up every time is that most board games have a lot of "if this happens, there is this specific rule that applies". Even relatively simple games (like Homeworlds) are pretty hard to nail down perfectly due to all the special cases.
Do you, or somebody else, have any recommendations on how to handle this?
[0] Dominion, Homeworlds and the battle part of Eclipse iirc.
JoeDaDude|4 years ago
[1]. https://en.wikipedia.org/wiki/Arimaa
mathgladiator|4 years ago
Something that I found amazing was inverting the flow control such that the server asks players questions with a list of possible choices simplifies the agent design tremendously. As I'm looking to retire to work on this project, I can generate the agent code and then hand-craft an AI. However, some AIs are soooo hard to even conceptualize.
majani|4 years ago
alper111|4 years ago
sdenton4|4 years ago
https://en.wikipedia.org/wiki/The_Player_of_Games
sillysaurusx|4 years ago
Funny that most of the comments are about the name. What an excellent choice.
chrisweekly|4 years ago
7thaccount|4 years ago
WithinReason|4 years ago
6510|4 years ago
unknown|4 years ago
[deleted]
unknown|4 years ago
[deleted]
sfkgtbor|4 years ago
https://en.m.wikipedia.org/wiki/The_Player_of_Games
CobrastanJorji|4 years ago
doctor_eval|4 years ago
dane-pgp|4 years ago
Borrible|4 years ago
https://theculture.fandom.com/wiki/List_of_spacecraft
hoseja|4 years ago
fxtentacle|4 years ago
The practical question is: How much computation do you need to get useful results? Alpha Go Zero is impressive mathematics, but who is willing to spend $1mio daily for months to train it? IMPALA (another Google one) can learn almost all Atari games, but you need a head node with 256 TPU cores and 1000+ evaluation workers to replicate the timings from the paper.
sillysaurusx|4 years ago
Suppose you're a business that needs to play games. Most people seem to think that it's a matter of plugging in the settings from the paper, buying the same hardware, then clicking a button and waiting.
It's not. The specific settings matter a lot.
But my main point is that you'll get most of your performance pretty rapidly. The only reason to leave it running for so long is to get that last N%, which is nice for benchmarks but not for business.
DeepMind overspends. Actually, they don't; they're not paying anywhere close to the price of a 256 core TPU. (Many external companies aren't, either, and you can get a good deal by negotiating with the Cloud TPU team.)
But you don't need a 256 core TPU. Lots of times, these algorithms simply do not require the amount of compute that people throw at the problem.
On the other hand, you can also usually get access to that kind of compute. A 256 core TPU isn't beyond reach. I'm pretty sure I could create one right now. It's free, thanks to TFRC, and you yourself can apply (and be approved). I was. https://sites.research.google/trc/
It kills me that it's so hard to replicate these papers, which is most of the motivation for my comment here. Ultimately, you're right: "How much compute?" is a big unknown. But the lower bound is much lower than most people realize (and most researchers).
gwern|4 years ago
Is that true? I was unaware that PPO, SAC, DQN, Impala, MuZero/AlphaZero etc would all automatically Just Work™ for hidden information games. Straight MCTS-inspired algorithms seem like they'd fail for reasons discussed in the paper, and while PPO/Impala work reasonably well in DoTA2/SC2, it's not obvious they'd converge to perfect play.
tsbinz|4 years ago
dontreact|4 years ago
ShamelessC|4 years ago
david_draco|4 years ago
moondistance|4 years ago
nixed|4 years ago
unknown|4 years ago
[deleted]
hervature|4 years ago
SuoDuanDao|4 years ago
Severian|4 years ago
ArtWomb|4 years ago
ausbah|4 years ago
WilliamDampier|4 years ago
323|4 years ago
> Grimes seemingly makes multiple, thinly veiled references to Musk in the song
https://www.independent.co.uk/arts-entertainment/music/news/...
junon|4 years ago
pixelpoet|4 years ago
thomasahle|4 years ago
BeenChilling|4 years ago
fho|4 years ago
ausbah|4 years ago
https://openai.com/five/
mensetmanusman|4 years ago
skinner_|4 years ago
A bit of googling shows that there is a General Game Playing AI community with their own Game Description Language. I never really encountered them before, and the DeepMind paper does not cite them, either.
dpflug|4 years ago
cab404|4 years ago
"SCP-29123 Player Of Games"
wiz21c|4 years ago
https://www.youtube.com/watch?v=-1F7vaNP9w0
antonpuz|4 years ago
simonebrunozzi|4 years ago
bkartal|4 years ago
cmauniada|4 years ago
crhutchins|4 years ago
RivieraKid|4 years ago
unknown|4 years ago
[deleted]
loxias|4 years ago
unknown|4 years ago
[deleted]
unknown|4 years ago
[deleted]
unknown|4 years ago
[deleted]
unknown|4 years ago
[deleted]
wly_cdgr|4 years ago
wetpaws|4 years ago
mudlus|4 years ago
TaupeRanger|4 years ago
Buttons840|4 years ago
mbrodersen|4 years ago
baq|4 years ago