No limit: AI poker bot is first to beat professionals at multiplayer game

[+] thomasfl|6 years ago|reply

One of the researchers, Tuomas Sundholm, has a real badass CV. Former pilot in the Finnish airforce. Finnish windsurfer champion. Snowboarder. Professor at Carnegie Mellon. Speaks four european languages, including swedish. And now at the age of 51, he has created the best AI powered poker bot.

https://www.cs.cmu.edu/~sandholm/cv.pdf

[+] pesenti|6 years ago|reply

Blog post: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...

Science article: https://science.sciencemag.org/content/early/2019/07/10/scie...

[+] YeGoblynQueenne|6 years ago|reply

>> Pluribus is also unusual because it costs far less to train and run than other recent AI systems for benchmark games. Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research.

That's the best part in all of this. I'm not convinced by the claim the authors repeatedly make, that this technique will translate well to real-world problems. But I'm hoping that there is going to be more of this kind of result, singalling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.

In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently.

[+] samfriedman|6 years ago|reply

The FB post is much more detailed and I think the link on this post should be updated to point there.

[+] gexla|6 years ago|reply

It's easy to "take away" too much information from this. The focus is that an AI poker bot "did this" and not get too much into other adjacent subjects.

But what's the fun in that?

10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.

In 2019, it's impractical to adapt as a competitive player in live poker. A grinder can see 10,000 hands within a day. The live poker room took 12 days. Another characteristic of online poker is that players can also use data to their advantage.

So, I wouldn't consider 10K hands as long term, even if this was a period of 12 days. Once players get a chance to adapt, then they'll increase their rate of wins against a bot. Once you have a history of hand histories being shared, then it's all over. And again, give these players their own software tools.

Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.

This doesn't take away from the development of the bot. If we learn something from it, then all good.

[+] noambrown|6 years ago|reply

I'm one of the authors of the bot, AMA

[+] n3k5|6 years ago|reply

What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.

The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.

As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?

Bonus questions in case you have the time and inclination to oblige:

What does this mean for people who like to play on-line Poker for real money?

Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?

[+] b_tterc_p|6 years ago|reply

The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.

It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.

At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?

[+] bluetwo|6 years ago|reply

First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).

Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?

[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]

Also, curious how much poker you folks play in the lab for "research".

[+] snarf21|6 years ago|reply

How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?

Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.

It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.

[+] tc|6 years ago|reply

Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

- Is there anything interesting going on with how the strategy is compressed in memory?

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

[+] andr3w321|6 years ago|reply

Any chance of the code being released or a cepheus style answer key being provided? http://poker.srv.ualberta.ca/strategy

[+] isaacg|6 years ago|reply

In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?

[+] spenczar5|6 years ago|reply

The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.

[+] Jach|6 years ago|reply

Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...

[+] throwamay1241|6 years ago|reply

Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.

Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.

[+] kapurs151|6 years ago|reply

I'm very late to this post, so not sure if you're still around.

What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?

[+] tasubotadas|6 years ago|reply

Congrants on the bot!

I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.

Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?

Thanks

[+] haburka|6 years ago|reply

Why did you optimize for using less cpus? Was it a happy accident or a goal?

[+] waynecochran|6 years ago|reply

Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?

[+] samfriedman|6 years ago|reply

Are there any ethical considerations relating to the prospect of use of this bot for cheating in real-money games? Either from your internal team or after public replication?

[+] pogopop77|6 years ago|reply

Very impressive. If my understanding of how the AI works is correct, it is using a pre-computed strategy developed by playing trillions of hands, but it is not dynamically updating that during game play, nor building any kind of profiles of opponents. I wonder if by playing against it many times, human opponents could discern any tendencies they could exploit. Especially if the pre-computed strategy remains static.

[+] darse|6 years ago|reply

A hearty congratulations, Noam, on finishing another chapter of the story i opened in the early 1990s...

Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.

Well done.

[+] hoerzu|6 years ago|reply

The progress you have made in this research field is amazing. What do you think will be next step or where do you the the future of your research?

[+] auggierose|6 years ago|reply

This is fascinating stuff. So do I understand this right, Liberatus worked using computing the Nash equilibrium, while the new multiplayer version works using self-play like AlphaGo Zero? Did you run the multiplayer version against the two-player version? If yes, how did it go? Could you recommend a series of books / papers that can take me from zero to being able to reprogram this (I know programming and mathematics, but not much statistics)? And how much computing resources / time did it take to train your bot?

[+] JaRail|6 years ago|reply

So let me see if I understand this. I don't believe it's hard to write a probabilistic program to play poker. That's enough to win against humans in 2-player.

With one AI and multiple professional human players sitting at a physical table, the humans outperform the probabilistic model because they take advantage of each other's mistakes/styles. Some players crash out faster but the winner gets ahead of the safe probabilistic style of play.

So this bot is better at the current professional player meta than the current players. In a 1v1 against a probabilistic model, it would probably also lose?

Am I understanding this properly? Or is playing the probabilistic model directly enough of a tell that it's also losing strategy? Meaning you need some variation of strategies, strategy detection, or knowledge of the meta to win?

[+] GCA10|6 years ago|reply

Hi Noam: I'm intrigued that you trained/tested the bot against strategies that were skewed to raise a lot, fold a lot and check a lot, as well as something resembling GTO. Were there any kinds of table situations where the bot had a harder time making money? Or where the AI crushed it?

I'm thinking in particular of unbalanced tables with an ever-changing mixture of TAG and LAG play. I've changed my mind three times about whether that's humans' best refuge -- or a situation that's a bot's dream.

You've done the work. Insights welcome.

[+] cyberferret|6 years ago|reply

With the advent of AI bots in Poker, Chess etc., what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?

I see on chess channels that grand masters have to rethink their whole game preparation methodology to cope with the "Alpha Zero" oddities that have now been introduced into this ancient game. They literally have to "throw out the book" of standard openings and middle games and start afresh.

[+] neural_thing|6 years ago|reply

How long until a slightly worse version of this model is reverse engineered and appears at every table in online poker?

[+] DevX101|6 years ago|reply

Slightly worse versions are already out in the wild. Bot using the published technique will be live in a couple of months tops.

[+] jessriedel|6 years ago|reply

Plenty of systems already exist that can win against weaker players and/or at limit (especially limit heads-up).

[+] pennaMan|6 years ago|reply

I'm wondering how long until poker games will require a captcha on every round

[+] ehsankia|6 years ago|reply

This is something I've always wondered, how come bots haven't taken over online poker considering how much money there is to be made, and all you need is to be slightly better than average right? Is high level poker really that hard to achieve?

[+] asdfman123|6 years ago|reply

Anywhere from 3 years ago to 5 years from now.

[+] merlincorey|6 years ago|reply

Pretty incredible that this has scaled down from 100 CPUs (and a couple terabytes of RAM) for their two player limit hold'em bot to just two CPUs for the no limit bot.

[+] donk2019|6 years ago|reply

Congrats Noam for the great breakthrough work!

I have a question about the conspiracy. For the 5 Human + 1 AI setting, since the human pros know which player is AI (read from your previous response), is it possible for human players to conspire to beat the AI? And in theory, for multi-player game, even the AI plays at the best strategy, is it still possible to be beat by conspiracy of other players?

Thanks.

[+] asdfman123|6 years ago|reply

So, is this the end of online poker?

Will it just become increasingly sophisticated bots playing each other online?

[+] trishume|6 years ago|reply

I'm really confused about why stock for the company that makes PokerStars hasn't moved at all today: https://www.google.com/search?tbm=fin&q=TSE:+TSGI#scso=_wqsn...

The fact that there's a published recipe for a superhuman bot that can be trained for $150 and run on any desktop computer sounds like an existential threat to their business.

The main mitigating factor I can think of is that you'd need to also adversarially train it so it isn't distinguishable from a skilled human. But that doesn't seem like it would be too difficult.

[+] vannevar|6 years ago|reply

How do we know that online poker has ever been a fair game? Has anyone ever done a statistical study of verified real players to determine whether their collective historical winnings match what would be expected in a fair game? It seems like it would be much too easy for the operators to skim money in any one of a thousand ways. I've never understood the trust people place in online gambling in general.

[+] solidasparagus|6 years ago|reply

So Dota 2 doesn't count as a multiplayer game?

OpenAI Five beat the world champions in back-to-back games...

[+] r00fus|6 years ago|reply

I was really hoping the article would go into more detail on how the AI engaged with the human players.

Was it online? the picture on the article seems to imply IRL.

If IRL, what inputs did it have, simply cards shown or could it read tells? Did those players know they were playing an AI?

[+] grandtour001|6 years ago|reply

Were the games played with real money? Nobody is going to take fake money games seriously.

[+] rofo1|6 years ago|reply

I'd love to see high-stakes heads-up bot vs Tom Dwan or Negreanu.

Maybe a bot technically qualifies as an opponent in durrr's challenge [0]? :)

How would bluffing influence the outcome? Both these players who are considered very strong, are known to play all kinds of hands.

[0] - https://en.wikipedia.org/wiki/Tom_Dwan#Full_Tilt_Poker_Durrr...

[+] nishantvyas|6 years ago|reply

I don't get this... Poker isn't pure mathematical... it has emotions involved (greed, fear, belief, reading others, manipulations (to fool the opponent)... and may be more... and all of these emotions arises differently for different people based on their time, place, their world view, their background and history...)

Are we now saying that a computer can do this all in simulation? if so, it's a great break through in human history.

[+] throwamay1241|6 years ago|reply

At the nosebleeds, poker hasn't been around those things in a long time.

Poker is about exploitative play against people who base their play off emotions, and unperfect game theory optimal against players who don't base their play off emotions. The more perfect the GTO play is, the higher the winrate against the latter group, but higher stakes games are built around one or more bad players - pros will literally stop playing as soon as the fish busts.

[+] luckyalog|6 years ago|reply

isnt it just possible that the bot got lucky. It plays good. Maybe really good but does it play as good as a pro??? Would it win 9 wp bracelets. Would It make it to day 3 of the world series of poker.

Chris Moneymaker got some damn good hands. Its part of the game. Its why this feat is unremarkable and why poker is a crap game for AI. The outcomes are very loose, especially when the reason these guys are pros is partially because of their ability to read.

You are taking away a tool that made their proker players great and then expect them to be a metric to test the AI. A better test would be to have pro players play a set of 1, 2, 4, 7 basic rule bots and the AI does the same. Then you compare differences in play. With enough data points you can compare situations that are similar but the AI did better or worse. This is a fair comparison of skill.

Also if there are professional players at a multiplayer game the AI is getting help from other players. Just like Civ V I get help from the AI attacking itself. Im sure this AI got help from the players attacking eachother (especially if they were doing so and making the pot bigger for the AI to grab up, think of a player reraising another player after the bot does a check all in).

[+] awal2|6 years ago|reply

Despite the luck/noise in Poker, there are reasonable measures of performance, and while I'm not an expert in this area, the bot seems to be doing very well (see paper for details). Poker is not a "crap game for AI" it's actually quite a good game. It's a very simple example of a game with a lot of randomness (a feature not a bug) and hidden information that still admits a wide variety of skill levels (expert play is much better than intermediate play is much better than novice play). This is a great accomplishment.

More links for reference: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-... https://science.sciencemag.org/content/early/2019/07/10/scie...

[+] vecter|6 years ago|reply

"In a 12-day session with more than 10,000 hands, it beat 15 top human players."

That's not luck. See also: https://news.ycombinator.com/item?id=20416099

Also, Chris Moneymaker is a good poker player. He's no Phil Ivey or Tom Dwan, but he's still very good and has had decent results after his WSOP win.

[+] w_s_l|6 years ago|reply

I would love to get a hands on the source. Hook it up to an API like https://pokr.live and then basically build a computer vision poker bot.

The trick is how to create natural mouse click movements or keyboard inputs. This is the part that I'm most shaky on but the pokr.live API works by sending screenshots which it will translate into player actions at the table

disclaimer: pokr.live API is a WIP

[+] DeathArrow|6 years ago|reply

I was thinking a year ago about using Deep Reinforcement Learning in a poker bot what stopped me was the impossible amount of data and computation due to imperfect information nature of poker games. If I'll have the time I'll try to implement thing akin to the search technique described in the paper.

It might pay better than a full time job.

392 comments