(no title)
ta_tunestub | 6 years ago
I have the same question. Not sure I have an answer yet, but this paper includes some pseudocode that implements the algorithm: https://arxiv.org/src/1911.08265v1/anc/pseudocode.py
I'm planning on trying to train something simple like TicTacToe to both see if it works and understand how it works.
No comments yet.