top | item 45730900

(no title)

mckirk | 4 months ago

What would be your intuition as to which 'quality' of the LLMs this tournament then actually measures? Could we still use it as a proxy for a kind of intelligence, since they need to compensate for the fact that they are not really built to do well in a game like poker?

discuss

michalsustr|4 months ago

The tournament measures the cumulative winnings. However, those can be far from the statistical expectation due to the variance of card distribution in poker.

To establish a real winner, you need to play many games:

> As seen in the Claudico match (20), even 80,000 games may not be enough to statistically significantly separate players whose skill differs by a considerable margin [1]

It is possible to reduce the number of required games thanks to variance reduction techniques [1], but I don't think this is what the website does.

To answer the question - "which 'quality' of the LLMs this tournament then actually measures" - since we can't tell the winner reliably, I don't think we can even make particular claims about the LLMs.

However, it could be interesting to analyze the play from a "psychology profile perspective" of dark triad (psychopaths / machiavellians / narcissists). Essentially, these personality types have been observed to prefer some strategies and this can be quantified [2].

[1] DeepStack, https://static1.squarespace.com/static/58a75073e6f2e1c1d5b36...

[2] Generation of Games for Opponent Model Differentiation https://arxiv.org/pdf/2311.16781