(no title)
andy_xor_andrew | 8 months ago
For example, you can use a dataset of chess games from agents that move totally randomly (with no strategy at all) and use that as an input for Q-Learning, and it will still converge on an optimal policy (albeit more slowly than if you had more high-quality inputs)
Ericson2314|8 months ago