top | item 38707475

Show HN: Easily train AlphaZero-like agents on any environment you want

87 points| s-casci | 2 years ago |github.com

21 comments

order

mdaniel|2 years ago

This repo and the code files appear to be missing any licensing details

You'll also likely want to mention the "needs python >= 3.8" in the readme https://github.com/s-casci/tinyzero/blob/244a263976cd9a09f5f... OT1H, I would hope folks are keeping their pythons current, but OTOH dev environments are gonna dev environment

s-casci|2 years ago

Good catches, I've added the missing information. Thanks

JoeDaDude|2 years ago

Whoa, so cool!! You know what would be even cooler? if you could have it play any game described by the Game Description Language [1]. it looks like the project is most of the way there, since the environment methods looks like calls to data that would be included in a GDL description.

[1]. https://en.wikipedia.org/wiki/Game_Description_Language

vldmrs|2 years ago

Are there any sample game descriptions for some games ? I have checked all the links but couldn’t find a single example.

s-casci|2 years ago

Interesting, I didn't know about it... Modifying the existing environments' interfaces shouldn't be too difficult. Feel free to submit a PR!

ZiggerZZ|2 years ago

Didn't know about this formalism! Are there any Python libraries that support GDL?

vermaat|2 years ago

Noob here. How is this different than reinforcement learning libraries like: OpenAI’s Gym TensorFlow’s TF-Agents ReAgent by Meta DeepMind’s OpenSpiel Amazon SageMaker RL

s-casci|2 years ago

There certainly are other projects around AlphaZero, I'd say this is simpler and much more basic

jasonjmcghee|2 years ago

Fwiw openai no longer develops or maintains “gym”, which might dissuade some folks investing too deeply into it.

I haven’t used it in a few years but certainly was the standard back then

tomatovole|2 years ago

Do you have evaluations for how well the trained agents do (e.g. for chess, go, etc)?

Reubend|2 years ago

If this is a faithful reimplementation of the AlphaZero algorithm (and I haven't looked through the code to confirm whether or not it is) then you'd expect equal performance to the published results after enough iterations of training. But the author probably doesn't have the resources to train agents on the same scale as Google did, and so performance in your own usage would largely come down to how long you can afford to train fr.

viraptor|2 years ago

I think this glances over don't details here:

> get_legal_actions(): returns a list of legal actions

What's the expectation around your actions? It's not just 0..n for current actions with any arbitrary ordering, right? There needs to be some consistency between steps for training.

s-casci|2 years ago

The policy function outputs the probability of taking every possible (legal or illegal) action. Once you have a way of indexing those actions, both the policy and the game need to refer to the same thing when indexing the same number

Y_Y|2 years ago

Maybe I'll finally be able to train a worthy opponent for Carcassonne!

s-casci|2 years ago

If you do that, please submit a PR!

ilc|2 years ago

How would this handle games with random or incomplete information? Such as UNO, craps, etc. (I'd love to see what this thing does with a known losing game, just as a validation.)

gwern|2 years ago

The standard AlphaZero doesn't handle that. For that you'd need to graduate to more complex variants like the aforementioned ReBeL, AlphaZe* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10213697/ or BetaZero https://arxiv.org/abs/2306.00249 or ExIt-OOS https://arxiv.org/abs/1808.10120 or Player of Games https://arxiv.org/abs/2112.03178#deepmind .

(You could also move straight to MuZero variations: https://arxiv.org/abs/2106.04615#deepmind https://openreview.net/forum?id=X6D9bAHhBQ1#deepmind https://openreview.net/forum?id=QnzSSoqmAvB )

s-casci|2 years ago

AlphaZero has been made for perfect information games. That said, the Monte Carlo Tree Search in the library can be run with any agent that implements a value and policy function. So, while the AlphaZeroAgent in agents.py wouldn't fit the problem you are describing, implementing something like Meta's ReBeL (https://ai.meta.com/blog/rebel-a-general-game-playing-ai-bot...) shouldn't be an impossible task. The Monte Carlo Tree Search algorithm in mcts.py has been written to be modular from the start exactly to do something like this!