(no title)
dgant | 2 years ago
The benefit of RL in general is that you're training on states the agent is likely to find itself in, and the cost is needing an agent which explores salient states. Which is why we keep seeing RL as a finishing step after imitation (eg AlphaStar first learning StarCraft from replays)
No comments yet.