(no title)
Dn_Ab | 11 years ago
The article states that this algorithm is weak to bad players but that's more an artifact of resources and training method; one advantage of minimizing regret on games instead of using linear programming is that online learning versions can adapt to exploit poor play with payoff larger than the game's value.
I've also posted here before that RM solves 2 player Zero sum game more efficiently than linear programming and how it's related to boosting, portfolio optimization and as an abstraction of natural selection.
No comments yet.