Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.
During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:
> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on.
This is true, and MuZero's paper notes that it did better with less computation than AlphaZero. But it still used about 10x more computation to get there than AlphaGo, which was "bootstrapped" with human expert moves. I think this is very important context to anyone who is trying to implement an AI for their own game.
smokel|6 months ago
hulium|6 months ago
> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.
https://arxiv.org/pdf/1911.08265
CGamesPlay|6 months ago