One of the researchers, Tuomas Sundholm, has a real badass CV. Former pilot in the Finnish airforce. Finnish windsurfer champion. Snowboarder. Professor at Carnegie Mellon. Speaks four european languages, including swedish. And now at the age of 51, he has created the best AI powered poker bot.
>> Pluribus is also unusual because it costs far less to train and run than other recent AI systems for benchmark games. Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research.
That's the best part in all of this. I'm not convinced by the claim the authors repeatedly make, that this technique will translate well to real-world problems. But I'm hoping that there is going to be more of this kind of result, singalling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.
In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently.
It's easy to "take away" too much information from this. The focus is that an AI poker bot "did this" and not get too much into other adjacent subjects.
But what's the fun in that?
10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.
In 2019, it's impractical to adapt as a competitive player in live poker. A grinder can see 10,000 hands within a day. The live poker room took 12 days. Another characteristic of online poker is that players can also use data to their advantage.
So, I wouldn't consider 10K hands as long term, even if this was a period of 12 days. Once players get a chance to adapt, then they'll increase their rate of wins against a bot. Once you have a history of hand histories being shared, then it's all over. And again, give these players their own software tools.
Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.
This doesn't take away from the development of the bot. If we learn something from it, then all good.
What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.
The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.
As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?
Bonus questions in case you have the time and inclination to oblige:
What does this mean for people who like to play on-line Poker for real money?
Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?
The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.
It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.
At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?
First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).
Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?
[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]
Also, curious how much poker you folks play in the lab for "research".
How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?
Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.
It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.
Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:
- Are the action and information abstraction procedures hand-engineered or learned in some manner?
- How does it decide how many bets to consider in a particular situation?
- Is there anything interesting going on with how the strategy is compressed in memory?
- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?
- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?
- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?
- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?
- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?
In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?
The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.
Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...
Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.
Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.
I'm very late to this post, so not sure if you're still around.
What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?
I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.
Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?
Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?
Are there any ethical considerations relating to the prospect of use of this bot for cheating in real-money games? Either from your internal team or after public replication?
Very impressive. If my understanding of how the AI works is correct, it is using a pre-computed strategy developed by playing trillions of hands, but it is not dynamically updating that during game play, nor building any kind of profiles of opponents. I wonder if by playing against it many times, human opponents could discern any tendencies they could exploit. Especially if the pre-computed strategy remains static.
A hearty congratulations, Noam, on finishing another chapter of the story i opened in the early 1990s...
Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.
This is fascinating stuff. So do I understand this right, Liberatus worked using computing the Nash equilibrium, while the new multiplayer version works using self-play like AlphaGo Zero? Did you run the multiplayer version against the two-player version? If yes, how did it go? Could you recommend a series of books / papers that can take me from zero to being able to reprogram this (I know programming and mathematics, but not much statistics)? And how much computing resources / time did it take to train your bot?
So let me see if I understand this. I don't believe it's hard to write a probabilistic program to play poker. That's enough to win against humans in 2-player.
With one AI and multiple professional human players sitting at a physical table, the humans outperform the probabilistic model because they take advantage of each other's mistakes/styles. Some players crash out faster but the winner gets ahead of the safe probabilistic style of play.
So this bot is better at the current professional player meta than the current players. In a 1v1 against a probabilistic model, it would probably also lose?
Am I understanding this properly? Or is playing the probabilistic model directly enough of a tell that it's also losing strategy? Meaning you need some variation of strategies, strategy detection, or knowledge of the meta to win?
Hi Noam: I'm intrigued that you trained/tested the bot against strategies that were skewed to raise a lot, fold a lot and check a lot, as well as something resembling GTO. Were there any kinds of table situations where the bot had a harder time making money? Or where the AI crushed it?
I'm thinking in particular of unbalanced tables with an ever-changing mixture of TAG and LAG play. I've changed my mind three times about whether that's humans' best refuge -- or a situation that's a bot's dream.
With the advent of AI bots in Poker, Chess etc., what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?
I see on chess channels that grand masters have to rethink their whole game preparation methodology to cope with the "Alpha Zero" oddities that have now been introduced into this ancient game. They literally have to "throw out the book" of standard openings and middle games and start afresh.
This is something I've always wondered, how come bots haven't taken over online poker considering how much money there is to be made, and all you need is to be slightly better than average right? Is high level poker really that hard to achieve?
Pretty incredible that this has scaled down from 100 CPUs (and a couple terabytes of RAM) for their two player limit hold'em bot to just two CPUs for the no limit bot.
I have a question about the conspiracy. For the 5 Human + 1 AI setting, since the human pros know which player is AI (read from your previous response), is it possible for human players to conspire to beat the AI? And in theory, for multi-player game, even the AI plays at the best strategy, is it still possible to be beat by conspiracy of other players?
The fact that there's a published recipe for a superhuman bot that can be trained for $150 and run on any desktop computer sounds like an existential threat to their business.
The main mitigating factor I can think of is that you'd need to also adversarially train it so it isn't distinguishable from a skilled human. But that doesn't seem like it would be too difficult.
How do we know that online poker has ever been a fair game? Has anyone ever done a statistical study of verified real players to determine whether their collective historical winnings match what would be expected in a fair game? It seems like it would be much too easy for the operators to skim money in any one of a thousand ways. I've never understood the trust people place in online gambling in general.
I don't get this... Poker isn't pure mathematical... it has emotions involved (greed, fear, belief, reading others, manipulations (to fool the opponent)... and may be more... and all of these emotions arises differently for different people based on their time, place, their world view, their background and history...)
Are we now saying that a computer can do this all in simulation? if so, it's a great break through in human history.
At the nosebleeds, poker hasn't been around those things in a long time.
Poker is about exploitative play against people who base their play off emotions, and unperfect game theory optimal against players who don't base their play off emotions. The more perfect the GTO play is, the higher the winrate against the latter group, but higher stakes games are built around one or more bad players - pros will literally stop playing as soon as the fish busts.
isnt it just possible that the bot got lucky. It plays good. Maybe really good but does it play as good as a pro??? Would it win 9 wp bracelets. Would It make it to day 3 of the world series of poker.
Chris Moneymaker got some damn good hands. Its part of the game. Its why this feat is unremarkable and why poker is a crap game for AI. The outcomes are very loose, especially when the reason these guys are pros is partially because of their ability to read.
You are taking away a tool that made their proker players great and then expect them to be a metric to test the AI. A better test would be to have pro players play a set of 1, 2, 4, 7 basic rule bots and the AI does the same. Then you compare differences in play. With enough data points you can compare situations that are similar but the AI did better or worse. This is a fair comparison of skill.
Also if there are professional players at a multiplayer game the AI is getting help from other players. Just like Civ V I get help from the AI attacking itself. Im sure this AI got help from the players attacking eachother (especially if they were doing so and making the pot bigger for the AI to grab up, think of a player reraising another player after the bot does a check all in).
Despite the luck/noise in Poker, there are reasonable measures of performance, and while I'm not an expert in this area, the bot seems to be doing very well (see paper for details). Poker is not a "crap game for AI" it's actually quite a good game. It's a very simple example of a game with a lot of randomness (a feature not a bug) and hidden information that still admits a wide variety of skill levels (expert play is much better than intermediate play is much better than novice play). This is a great accomplishment.
I would love to get a hands on the source. Hook it up to an API like https://pokr.live and then basically build a computer vision poker bot.
The trick is how to create natural mouse click movements or keyboard inputs. This is the part that I'm most shaky on but the pokr.live API works by sending screenshots which it will translate into player actions at the table
I was thinking a year ago about using Deep Reinforcement Learning in a poker bot what stopped me was the impossible amount of data and computation due to imperfect information nature of poker games. If I'll have the time I'll try to implement thing akin to the search technique described in the paper.
[+] [-] thomasfl|6 years ago|reply
https://www.cs.cmu.edu/~sandholm/cv.pdf
[+] [-] pesenti|6 years ago|reply
Science article: https://science.sciencemag.org/content/early/2019/07/10/scie...
[+] [-] YeGoblynQueenne|6 years ago|reply
That's the best part in all of this. I'm not convinced by the claim the authors repeatedly make, that this technique will translate well to real-world problems. But I'm hoping that there is going to be more of this kind of result, singalling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.
In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently.
[+] [-] samfriedman|6 years ago|reply
[+] [-] gexla|6 years ago|reply
But what's the fun in that?
10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.
In 2019, it's impractical to adapt as a competitive player in live poker. A grinder can see 10,000 hands within a day. The live poker room took 12 days. Another characteristic of online poker is that players can also use data to their advantage.
So, I wouldn't consider 10K hands as long term, even if this was a period of 12 days. Once players get a chance to adapt, then they'll increase their rate of wins against a bot. Once you have a history of hand histories being shared, then it's all over. And again, give these players their own software tools.
Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.
This doesn't take away from the development of the bot. If we learn something from it, then all good.
[+] [-] noambrown|6 years ago|reply
[+] [-] n3k5|6 years ago|reply
The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.
As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?
Bonus questions in case you have the time and inclination to oblige:
What does this mean for people who like to play on-line Poker for real money?
Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?
[+] [-] b_tterc_p|6 years ago|reply
It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.
At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?
[+] [-] bluetwo|6 years ago|reply
Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?
[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]
Also, curious how much poker you folks play in the lab for "research".
[+] [-] snarf21|6 years ago|reply
Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.
It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.
[+] [-] tc|6 years ago|reply
- Are the action and information abstraction procedures hand-engineered or learned in some manner?
- How does it decide how many bets to consider in a particular situation?
- Is there anything interesting going on with how the strategy is compressed in memory?
- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?
- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?
- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?
- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?
- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?
[+] [-] andr3w321|6 years ago|reply
[+] [-] isaacg|6 years ago|reply
[+] [-] spenczar5|6 years ago|reply
[+] [-] Jach|6 years ago|reply
[+] [-] throwamay1241|6 years ago|reply
Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.
[+] [-] kapurs151|6 years ago|reply
What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?
[+] [-] tasubotadas|6 years ago|reply
I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.
Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?
Thanks
[+] [-] haburka|6 years ago|reply
[+] [-] waynecochran|6 years ago|reply
[+] [-] samfriedman|6 years ago|reply
[+] [-] pogopop77|6 years ago|reply
[+] [-] darse|6 years ago|reply
Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.
Well done.
[+] [-] hoerzu|6 years ago|reply
[+] [-] auggierose|6 years ago|reply
[+] [-] JaRail|6 years ago|reply
With one AI and multiple professional human players sitting at a physical table, the humans outperform the probabilistic model because they take advantage of each other's mistakes/styles. Some players crash out faster but the winner gets ahead of the safe probabilistic style of play.
So this bot is better at the current professional player meta than the current players. In a 1v1 against a probabilistic model, it would probably also lose?
Am I understanding this properly? Or is playing the probabilistic model directly enough of a tell that it's also losing strategy? Meaning you need some variation of strategies, strategy detection, or knowledge of the meta to win?
[+] [-] GCA10|6 years ago|reply
I'm thinking in particular of unbalanced tables with an ever-changing mixture of TAG and LAG play. I've changed my mind three times about whether that's humans' best refuge -- or a situation that's a bot's dream.
You've done the work. Insights welcome.
[+] [-] cyberferret|6 years ago|reply
I see on chess channels that grand masters have to rethink their whole game preparation methodology to cope with the "Alpha Zero" oddities that have now been introduced into this ancient game. They literally have to "throw out the book" of standard openings and middle games and start afresh.
[+] [-] neural_thing|6 years ago|reply
[+] [-] DevX101|6 years ago|reply
[+] [-] jessriedel|6 years ago|reply
[+] [-] pennaMan|6 years ago|reply
[+] [-] ehsankia|6 years ago|reply
[+] [-] asdfman123|6 years ago|reply
[+] [-] merlincorey|6 years ago|reply
[+] [-] donk2019|6 years ago|reply
I have a question about the conspiracy. For the 5 Human + 1 AI setting, since the human pros know which player is AI (read from your previous response), is it possible for human players to conspire to beat the AI? And in theory, for multi-player game, even the AI plays at the best strategy, is it still possible to be beat by conspiracy of other players?
Thanks.
[+] [-] asdfman123|6 years ago|reply
Will it just become increasingly sophisticated bots playing each other online?
[+] [-] trishume|6 years ago|reply
The fact that there's a published recipe for a superhuman bot that can be trained for $150 and run on any desktop computer sounds like an existential threat to their business.
The main mitigating factor I can think of is that you'd need to also adversarially train it so it isn't distinguishable from a skilled human. But that doesn't seem like it would be too difficult.
[+] [-] vannevar|6 years ago|reply
[+] [-] solidasparagus|6 years ago|reply
OpenAI Five beat the world champions in back-to-back games...
[+] [-] r00fus|6 years ago|reply
Was it online? the picture on the article seems to imply IRL.
If IRL, what inputs did it have, simply cards shown or could it read tells? Did those players know they were playing an AI?
[+] [-] grandtour001|6 years ago|reply
[+] [-] rofo1|6 years ago|reply
Maybe a bot technically qualifies as an opponent in durrr's challenge [0]? :)
How would bluffing influence the outcome? Both these players who are considered very strong, are known to play all kinds of hands.
[0] - https://en.wikipedia.org/wiki/Tom_Dwan#Full_Tilt_Poker_Durrr...
[+] [-] nishantvyas|6 years ago|reply
Are we now saying that a computer can do this all in simulation? if so, it's a great break through in human history.
[+] [-] throwamay1241|6 years ago|reply
Poker is about exploitative play against people who base their play off emotions, and unperfect game theory optimal against players who don't base their play off emotions. The more perfect the GTO play is, the higher the winrate against the latter group, but higher stakes games are built around one or more bad players - pros will literally stop playing as soon as the fish busts.
[+] [-] luckyalog|6 years ago|reply
Chris Moneymaker got some damn good hands. Its part of the game. Its why this feat is unremarkable and why poker is a crap game for AI. The outcomes are very loose, especially when the reason these guys are pros is partially because of their ability to read.
You are taking away a tool that made their proker players great and then expect them to be a metric to test the AI. A better test would be to have pro players play a set of 1, 2, 4, 7 basic rule bots and the AI does the same. Then you compare differences in play. With enough data points you can compare situations that are similar but the AI did better or worse. This is a fair comparison of skill.
Also if there are professional players at a multiplayer game the AI is getting help from other players. Just like Civ V I get help from the AI attacking itself. Im sure this AI got help from the players attacking eachother (especially if they were doing so and making the pot bigger for the AI to grab up, think of a player reraising another player after the bot does a check all in).
[+] [-] awal2|6 years ago|reply
More links for reference: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-... https://science.sciencemag.org/content/early/2019/07/10/scie...
[+] [-] vecter|6 years ago|reply
That's not luck. See also: https://news.ycombinator.com/item?id=20416099
Also, Chris Moneymaker is a good poker player. He's no Phil Ivey or Tom Dwan, but he's still very good and has had decent results after his WSOP win.
[+] [-] w_s_l|6 years ago|reply
The trick is how to create natural mouse click movements or keyboard inputs. This is the part that I'm most shaky on but the pokr.live API works by sending screenshots which it will translate into player actions at the table
disclaimer: pokr.live API is a WIP
[+] [-] DeathArrow|6 years ago|reply
It might pay better than a full time job.