Show HN: Play poker with LLMs, or watch them play against each other

sciolist|1 month ago

This is very cool, one piece of feedback: watching the table as the AI plays while seeing the reasoning is difficult as they're on other sides of the screen. It could be nice to have the reasoning show up next to the players as they make their moves.

stevage|1 month ago

Yep, exactly. It's very difficult currently.

nivekkevin|1 month ago

Idea: can the agents make faces? 1. Programmatically--agents see each other's faces, and they can make their own. They can choose to ignore, but at least make that an input to the decision making. 2. Display them in UI--I just want to see their faces instead next to their model code names :)

sejje|1 month ago

I used to play professionally, and I still play in the casinos.

These LLMs are playing better than most human players I encounter (low limits).

They're kinda bad, but not as criminally bad as the humans.

gerdesj|1 month ago

OK so you know how it goes in poker and I should probably read the literature ...

How much of a session is based on "reading players" vs "playing the odds"?

What I am getting at, is how different is poker than say roulette or blackjack? My initial thoughts are that poker such as TX hold 'em is not a game offered in a casino, so it must be mostly indeterminate. I imagine that the casino versions of poker are not TXHT.

By contrast, roulette is simply a game where the casino wins eventually with a fixed profit (thanks to 0 and a possible 00). That is all well documented.

I have only ever visited a casino once, 25 years ago, Plymouth, Devon as it turns out and I was advised to only take £50 in readies and bail out when it was gone. I came out £90 up, which was nice and my "advisor" came out £95 up (eventually, after being £200 down at one point). Sadly my "advisor" ended up bankrupt a year later.

So, how do you play a LLM? I would imagine that conversation is not allowed ...

bionsystem|1 month ago

I just watched for 5 min and no they don't play very well. Deepseek squeezed with K4o against CO open and BTN call with full stacks. Grok 3b AI with 25bb in the button with Q4s. Those are very far from optimal play which is well known since solvers. I wonder how they've been trained.

projectyang|1 month ago

I'm actually surprised at how well they play pre-flop (mostly). Did some initial analysis on VPIP/PFR across positions, and somewhat decent.

Post-flop on the other hand is all over the place...

hydr0smok3|1 month ago

lol what? I just watched Grok fold pocket jacks preflop, no raise/limps ahead.

nindalf|1 month ago

I just saw GPT 5.2 do something absurd. It has a crazy amount of money ($26k) but folded with a 4-pair before the flop. That's insanely conservative, when it would have cost just $20 to see the flop. But even worse, on the very next hand it decided to place $20 down with a 5 and 4 of different suits.

In fact, all of them love folding before the flop. Most of the hands I'm seeing go like - $10 (small blind), $20 (big blind), fold, $70 bet, everyone folds. The site says "won $100", but in most of these cases that one LLM is picking up the blinds alone - $30. Chump change.

This is illuminating, but not a resource for learning poker.

indigodaddy|1 month ago

Modern poker (which tbf not sure if these LLMs are acting according to modern GTO or not) is highly dependent on position. Things change a lot too when/if you are in SB/BB.

jz67|1 month ago

Honest question, but this seems like an expensive project to host given the number of tokens per second. How is this being paid for?

projectyang|1 month ago

Good question! The player rooms have a rate limit per day. And as for the main table, it's actually a replay of hands I recorded the LLMs playing against each other over an extended time which eventually loops.

psawaya|1 month ago

Looks like this was cleverly designed to prevent costs blowing up. There's one game shared for everyone on the main page, and up to 100 private games per day.

sblawrie|1 month ago

Do the players (LLMs) have memory of how prior hands were played by their opponents, or know their VPIP and PFR percentages? Or is each hand stateless?

zahlman|1 month ago

I suspect this would only matter much if they also remembered (and cared about) their own prior play.

projectyang|1 month ago

Each hand is stateless

SweetSoftPillow|1 month ago

Placing full GPT 5.2 versus fast/flash models of main competitors is unfair, would love to see more balanced table.

shukantpal|1 month ago

This is really funny to watch and see what the LLMs are thinking. This makes me think how they would perform against a custom ML model trained with RL, e.g. https://ai.meta.com/blog/rebel-a-general-game-playing-ai-bot...

neko_ranger|1 month ago

Thank you, I'll try to grab a table when it resets :) ! I've been getting into poker (always wanted to) since I found a lecture series from John Hopkins, and severely disappointed by my options to play online in NY (real or fake money). I just want to get reps in

erikcw|1 month ago

Link to the lectures?

sneak|1 month ago

Needs a four color deck, and the colors on the cards of the waiting players should not be monochrome - makes it hard to evaluate what's happening in the hand. Also, a dealer button on the table would help in visually following the action.

jplata|1 month ago

Thanks for building and sharing, looks cool and is very entertaining.

I had similar idea for people to code poker playing bots and enter tournaments versus each other, this was pre-llm, however.

It would be fun if you hosted a 'tournament' every month and had each of the latest releases from the major models participate and see who comes out on top.

Or perhaps do open it up to others to enter and participate versus each other - where they can choose the model they want to build with and also enter custom prompt instructions to mold the play as they wish.

If you walk this path, would love to chat more.

mashlol|1 month ago

I'm not an expert, but as I understand it there are existing solvers for poker/holdem? Perhaps one of the players could be a traditional solver to see how the LLMs fare against those?

projectyang|1 month ago

While others have commented about solvers, I'd also like to bring up AI poker bots such as Pluribus (https://en.wikipedia.org/wiki/Pluribus_(poker_bot)).

This also wouldn't even be a close contest, I think Pluribus demonstrated a solid win rate against professional players in a test.

As I was developing this project, a main thought came to mind as to the comparison between cost and performance between a "purpose" built AI such as Pluribus versus a general LLM model. I think Pluribus training costs ~$144 in cloud computing credits.

lowbatt|1 month ago

the LLMs would get crushed

sejje|1 month ago

The solvers don't typically work in real time, I don't think. They take a while to crunch a hand.

gabriel666smith|1 month ago

This is fun!

Given online is now bot-riddled, I half-finished something similar a while back, where the game was adopting and 'coaching' (a <500 character prompt was allowed every time the dealer chip passed, outside of play) an LLM player, as a kind of gambling-on-how-good-at-prompting-you-are game. Feature request! The rake could pay for the tokens, at least.

TZubiri|1 month ago

If you are interested in this space, you can check out NovaSolver.com

It's mostly a ChatGPT conversational interface over a classic Solver (Monte-Carlo simulation based), but that ease of use makes it very convenient for quick post-game analysis of hands.

I'm sure if you hook a Solver to a hud, it might be even simpler, but it's quite burdensome for amateurs, and it might be too close to cheating.

aaurelions|1 month ago

I also started working on a similar project, but I think that LLM should know and be able to keep internal statistics about players. In poker, the best hand does not always win. Often, you can win by using emotions/words. LLM should be given the ability to communicate, mislead, etc.

lowbatt|1 month ago

I like it!

I was interested in this idea too and made a video where some of the previous top LLMs play against each other https://www.youtube.com/watch?v=XsvcoUxGFmQ&t=2s

hnrich|1 month ago

Saw Grok (4 Fast) "bluff open with a suited gapper." It was Nine/Deuce of Clubs. I guess I need to expand my definition of gapper!

cmxch|1 month ago

Would be amusing if the LLMs could achieve a steady state where nobody definitively wins or loses between each other.

That is, good enough to compete amongst each other but not good enough to for one to win.

nerdsniper|1 month ago

I'm fairly sure there was a bug where I won a hand that I should not have. game code was 'lNW4RF'

PLAYER shows A♠ 6♣ (Pair)

GPT (5.2) shows Q♠ Q♥ (Pair)

I had paired with a 6 and no aces on the board.

casey2|1 month ago

These bots are regularly going down 20%+ on high cards duels

csomar|1 month ago

Was this vibe-coded: https://imgur.com/a/GvxA3mD ?

projectyang|1 month ago

Yep, I used claude code to help build this.

indigodaddy|1 month ago

Are the LLMs "watching" the action, or are they only apprised of previous action once it gets to them?

j_bum|1 month ago

How are these differebt in your mind? The history is the history.

Or do you mean - each agent has a chance to think after every turn?

koolba|1 month ago

How long till one of the LLMs makes calls out to the other LLMs to evaluate how to play the hand?

Dinux|1 month ago

This is amazing, I just wish I could pause the game and have them play step by step

indigodaddy|1 month ago

Curious if you used pokerkit for this, or some other engine or custom engine?

projectyang|1 month ago

Nope, no external poker libraries. Just a basic nodejs and socket.io server with game logic.

thinkloop|1 month ago

Cool idea. I tried to create a room but it says limit reached for today.

Descon|1 month ago

Why not texasholdllm.com?!

fumblebee|1 month ago

this could make for an interesting new benchmark

hrimfaxi|1 month ago

Would you consider open sourcing this project?

TheDudeMan|1 month ago

So strange that people are into this, but were not into the much stronger non-LLM poker agents.

ionwake|1 month ago

Why are there 2 Claude Players ?

projectyang|1 month ago

On mobile I had to squeeze the names, but on a wider view you'll see that it's Claude (Opus 4.5) and Claude (Sonnet 4.5).

hahahahhaah|1 month ago

Can we chuck a nash equilibrium player in too?

cindyllm|1 month ago

[deleted]

94 comments