top | item 28767825

(no title)

cbutner | 4 years ago

The original hope was for this to be a third head on top of the AlphaZero model, but I couldn't think of a way to generate commentary during self-play (such that it would gradually improve), and trying to rotate supervised commentary training into the main schedule ended up hurting both sides because of the disjoint datasets.

So, now the commentary decoder is just trained separately on the final primary model. The previous and current game positions are fed into the primary model, and the outputs are taken from the final convolutional layer, just before the value and policy heads. Then, that data plus the side to play is positionally encoded and fed into a transformer decoder.

It would be better for a search tree/algorithm to be used for commentary too so that tactics could be better understood, but that would need some kind of subjective BLEU equivalent, and metrics like those don't work well for chess commentary.

You can see a diagram of the architecture here: https://chrisbutner.github.io/ChessCoach/high-level-explanat...

discuss

thomasahle|4 years ago

I think training this as a separate head on top of a frozen AlphaZero model makes a lot of sense. I don't think anyone has figured out to do language learning with reinforcement training.

Actually, I can't figure out from your explanation why you trained the whole network yourself instead of just using Leela's network and training the commentary head on top?

If you wanted to in-cooperate the search, maybe you could just take the 1800 or so probabilities output by the MCTS and add some layers on top of that before concatenating with the other data fed into the transformer.

In either case, this is a fantastic project and perhaps an even more impressive write up! Congrats and thank you!

cbutner|4 years ago

It was partly because I was looking to improve self-play and training tractability on a home desktop with 1 GPU (complete failure), and partly to learn about everything from scratch. I would be interested to see how strong it is with the same search but with Leela's inference backend (for GPU at least) and network.

In terms of search-into-commentary, concatenating like that may be interesting, as long as it can learn to map across - definitely plausible without too much work. I was originally thinking of something more complicated, combining multiple raw network outputs across the tree through some kind of trained weighting, or additional model via recurrence, and punted it.

Ignore my BLEU comment, mixed those up between replies - that was the other potential use of search trees for commentary, an MCTS/PUCT-style alternative to traditional sequential top-k/top-p sampling, once you have logits and are deciding which paragraph to generate.

Thanks!