top | item 45733632

(no title)

eclark | 4 months ago

They would need to lie, which they can't currently do. To play at our current best, our approximation of optimal play involves ranges. Thinking about your hand as being any one of a number of cards. Then imagine that you have combinations of those hands, and decide what you would do. That process of exploration by imagination doesn't work with an eager LLM using huge encoded context.

discuss

jwatte|4 months ago

I don't think this analysis matches the underlying implementation.

The width of the models is typically wide enough to "explore" many possible actions, score them, and let the sampler pick the next action based on the weights. (Whether a given trained parameter set will be any good at it, is a different question.)

The number of attention heads for the context is similarly quite high.

And, as a matter of mechanics, the core neuron formulation (dot product input and a non-linearity) excels at working with ranges.

eclark|4 months ago

No the widths are not wide enough to explore. The number of possible game states can explode beyond the number of atoms in the universe pretty easily, especially if you use deep stacks with small big blinds.

For example when computing the counterfactual tree for 9 way preflop. 9 players have up to 6 different times that they can be asked to perform an action (seat 0 can bet 1, seat 1 raises min, seat 2 calls, back to seat 0 raises min, with seat 1 calling, and seat 2 raising min, etc). Each of those actions has check, fold, bet min, raise the min (starting blinds of 100 are pretty high all ready), raise one more than the min, raise two more than the min, ... raise all in (with up to a million chips).

(1,000,000.00 - 999,900.00) ^ 6 times per round ^ 9 players That's just for pre flop. Postflop, River, Turn, Showdown. Now imagine that we have to simulate which cards they have and which order they come in the streets (that greatly changes the value of the pot).

As for LLMs being great at range stats, I would point you to the latest research by UChicago. Text trained LLMs are horrible at multiplication. Try getting any of them to multiply any non-regular number by e or pi. https://computerscience.uchicago.edu/news/why-cant-powerful-...

Don't get what I'm saying wrong though. Masked attention and sequence-based context models are going to be critical to machines solving hidden information problems like this. Large Language Models trained on the web crawl and the stack with text input will not be those models though.

eru|4 months ago

Why would they need to lie? Where's the lying in Poker?

(Ignore for a moment that LLMs can lie just fine.)

What you are describing is exploring a range of counterfactuals. That's not lying.

eclark|4 months ago

Early game bluffs are essentially lies that you tell through the rest of the streets. In order to keep your opponents from knowing when you have premium starting hands, it's required to play some ranges, sometimes as if they were a different range. E.g., 10% of the time, I will bluff and act like I have AK, KK, AA, QQ. On the next street, I will need to continue that; otherwise, it becomes not profitable (opponents only need to wait one bet to know if I am bluffing). I have to evolve the lie as well. If cards come out that make my story more or less likely/profitable/possible, then I need to adjust the lie, not revert to the truth or the opponent's truth.

To see that LLMs aren't capable of this, I present all of the prompt jailbreaks that rely on repeated admonitions. And that makes sense if you think about the training data. There's not a lot of human writing that takes a fact and then confidently asserts the opposite as data mounts.

LLMs produce the most likely response from the input embeddings. Almost always, the easiest is that the next token is in agreement of the other tokens in the sequence. The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.

Also, notice that I'm careful to say LLM's and not generalize to all attention head + MLP models. As attention with softmax and dot product is a good universal function. Instead, it's the large language model part that makes the models not great fits for poker. Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.

lawlessone|4 months ago

>They would need to lie, which they can't currently do

They lie better than most people lol.