top | item 45107445

(no title)

moyix | 6 months ago

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

discuss

order

smokel|6 months ago

Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.

hulium|6 months ago

During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:

> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.

https://arxiv.org/pdf/1911.08265

CGamesPlay|6 months ago

This is true, and MuZero's paper notes that it did better with less computation than AlphaZero. But it still used about 10x more computation to get there than AlphaGo, which was "bootstrapped" with human expert moves. I think this is very important context to anyone who is trying to implement an AI for their own game.