top | item 36866229

(no title)

dgant | 2 years ago

The distinction making it RL is that the model is training on data produced by the model itself.

The benefit of RL in general is that you're training on states the agent is likely to find itself in, and the cost is needing an agent which explores salient states. Which is why we keep seeing RL as a finishing step after imitation (eg AlphaStar first learning StarCraft from replays)

discuss

No comments yet.