Whoa, so cool!!
You know what would be even cooler? if you could have it play any game described by the Game Description Language [1]. it looks like the project is most of the way there, since the environment methods looks like calls to data that would be included in a GDL description.
Noob here. How is this different than reinforcement learning libraries like:
OpenAI’s Gym
TensorFlow’s TF-Agents
ReAgent by Meta
DeepMind’s OpenSpiel
Amazon SageMaker RL
If this is a faithful reimplementation of the AlphaZero algorithm (and I haven't looked through the code to confirm whether or not it is) then you'd expect equal performance to the published results after enough iterations of training. But the author probably doesn't have the resources to train agents on the same scale as Google did, and so performance in your own usage would largely come down to how long you can afford to train fr.
> get_legal_actions(): returns a list of legal actions
What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.
The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number
How would this handle games with random or incomplete information? Such as UNO, craps, etc. (I'd love to see what this thing does with a known losing game, just as a validation.)
AlphaZero has been made for perfect information games. That said, the Monte Carlo Tree Search in the library can be run with any agent that implements a value and policy function. So, while the AlphaZeroAgent in agents.py wouldn't fit the problem you are describing, implementing something like Meta's ReBeL (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-bot...) shouldn't be an impossible task. The Monte Carlo Tree Search algorithm in mcts.py has been written to be modular from the start exactly to do something like this!
mdaniel|2 years ago
You'll also likely want to mention the "needs python >= 3.8" in the readme https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f... OT1H, I would hope folks are keeping their pythons current, but OTOH dev environments are gonna dev environment
s-casci|2 years ago
JoeDaDude|2 years ago
[1]. https://en.wikipedia.org/wiki/Game_Description_Language
vldmrs|2 years ago
s-casci|2 years ago
ZiggerZZ|2 years ago
vermaat|2 years ago
s-casci|2 years ago
jasonjmcghee|2 years ago
I haven’t used it in a few years but certainly was the standard back then
tomatovole|2 years ago
Reubend|2 years ago
viraptor|2 years ago
> get_legal_actions(): returns a list of legal actions
What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.
s-casci|2 years ago
Y_Y|2 years ago
s-casci|2 years ago
ilc|2 years ago
gwern|2 years ago
(You could also move straight to MuZero variations: https://arxiv.org/abs/2106.04615#deepmind https://openreview.net/forum?id=X6D9bAHhBQ1#deepmind https://openreview.net/forum?id=QnzSSoqmAvB )
s-casci|2 years ago