Show HN: Prisoner's dilemma tournament with bots that can simulate each other

[+] jere|11 years ago|reply

Is there any reason the number of rounds is fixed? I didn't think anything interesting could come of that:

>If the game is played exactly N times and both players know this, then it is always game theoretically optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: one might as well defect on the last turn, since the opponent will not have a chance to punish the player. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

Also, I'm wondering if the rules prohibit collusion. The rules say one bot per person, but don't mention teams.

[+] spimta|11 years ago|reply

I actually fixed the number of rounds on purpose--I figure the Nash equilibrium in the classical iterated PD isn't very relevant in this wildly different version of the game, and I want to see what kinds of strategies emerge with fixed-size rounds.

I didn't explicitly forbid collusion, since it's tough to prevent it in practice; I would appreciate it if teams simply submitted one bot or several bots with different strategies.

[+] jamesbritt|11 years ago|reply

Folks interested in reading more about the Prisoner's Dilemma might want to read Robert Axelrods' The Evolution of Cooperation.

http://www.amazon.com/The-Evolution-Cooperation-Revised-Edit...

He also wrote a follow-up book, The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration

http://www.amazon.com/The-Complexity-Cooperation-Agent-Based...

[+] j2kun|11 years ago|reply

There is a fun anecdote about how to beat tit-for-tat by using a collusion strategy. What you do is enter a team of bots, and their strategy is as follows:

1. For the first 50 rounds, behave according to a predefined sequence. 2. If your opponent also does this sequence, then cooperate in every subsequent round. Otherwise defect in every subsequent round.

I don't have a reference on hand, but from what I remember this defeated all other algorithms by a large margin in one of the main yearly Prisoner's Dilemma tournaments.

[+] LanceH|11 years ago|reply

The bots should all cooperate with each other, except for one which will defect against peers and take all the points for the win.

[+] philh|11 years ago|reply

> The 1/100th of a second time limit used in this example is more than enough time for a bot that does not loop infinitely to terminate on any modern processor; simple bots like cooperateBot or titForTat will terminate in less than 1/1000th of a second. For the purposes of this tournament, you can consider 1/100th of a second a safe upper bound on simulation time when writing your own bots. In addition, the time function itself costs slightly less than 1/100th of a second in overhead

This doesn't seem safe. Suppose I'm playing against JusticeBot, and simulate ver playing against me. Then ve simulates me playing against CooperateBot, and presumably sim-me simulates CooperateBot playing against me. The recursion ends there. But JusticeBot called time once directly, and vis simulation of me also called time, which means ve took more than 1/100 of a second to run in total.

[+] spimta|11 years ago|reply

When you call the time function, it will cut off the simulation once that 1/100th of a second limit is reached, no matter what happens inside the simulation. So even if your opponent simulates you or performs some expensive computation or even goes into an infinite loop, time will terminate in the allotted number of seconds plus the overhead. If that happens, though, it will return Nothing, so you won't have gained any information about what your opponent will do.

[+] philh|11 years ago|reply

> The tit-for-tat is very effective in the classical iterated prisoner's dilemma, but it is easily exploitable in the program equilibrium verion of the game. This is because the bot's decision is based entirely on the history of previous moves, so an opponent can always predict when the tit-for-tat bot will cooperate and then defect against it.

Unless I'm missing something, doing this naively doesn't let you exploit TitForTat, it just decreases your reward from (3 every round) to (5 every other round).

You can still exploit TitForTat, e.g. by defecting in the final round. But that's not a massive win for you. CooperateBot is much more exploitable.

[+] spimta|11 years ago|reply

Yep, you're exactly right--I didn't actually add up the payoffs when I wrote that. You could potentially do better against tit-for-tat variants that have a certain chance of forgiving after a defection, but not against the naive, deterministic TitForTat.

[+] diziet|11 years ago|reply

A plot device based on a similar concept (including simulating other players) is used in Hannu Rajaniemi's The Quantum Thief. There is a Prisoner's Dilemma Prison and the winning entity cheats by emulating others.

https://en.wikipedia.org/wiki/The_Quantum_Thief

[+] pokpokpok|11 years ago|reply

This looks like it's bound to result in a messy spiral of infinite second guessing. Curious what the winning strategy will be!

[+] jreimers|11 years ago|reply

I am curious as to how this will work.

How does bot A simulate bot B that must simulate bot A to determine its answer? Seems like infinite trampoline recursion.

EDIT: it seems you run the simulation against a different bot implementation than the running one and attempt to determine its characteristics given the responses and the known behaviour of the other implementation.

[+] malloreon|11 years ago|reply

Hannu Rajaniemi's All-Defector would win.

18 comments