top | item 45947821

Super human Stratego with RL and test time search

2 points| algo_trader | 3 months ago |arxiv.org

1 comment

order

algo_trader|3 months ago

Only 2000 GPU hours Heavily customized network 95% win rate in recent human tournament sample Several training techniques for evaluation/learning rate