(no title)
mckirk
|
4 months ago
What would be your intuition as to which 'quality' of the LLMs this tournament then actually measures? Could we still use it as a proxy for a kind of intelligence, since they need to compensate for the fact that they are not really built to do well in a game like poker?
michalsustr|4 months ago
To establish a real winner, you need to play many games:
> As seen in the Claudico match (20), even 80,000 games may not be enough to statistically significantly separate players whose skill differs by a considerable margin [1]
It is possible to reduce the number of required games thanks to variance reduction techniques [1], but I don't think this is what the website does.
To answer the question - "which 'quality' of the LLMs this tournament then actually measures" - since we can't tell the winner reliably, I don't think we can even make particular claims about the LLMs.
However, it could be interesting to analyze the play from a "psychology profile perspective" of dark triad (psychopaths / machiavellians / narcissists). Essentially, these personality types have been observed to prefer some strategies and this can be quantified [2].
[1] DeepStack, https://static1.squarespace.com/static/58a75073e6f2e1c1d5b36...
[2] Generation of Games for Opponent Model Differentiation https://arxiv.org/pdf/2311.16781