top | item 46517618

(no title)

breakds | 1 month ago

The sample efficiency of the RL algorithm, even for simple games, is not very good. This usually means that we will need a lot of episodes for the policy to learn to excel. Being able to run policy in an environment that can parallel and accelerate could be very helpful for the improvement - for example running a batch of browsers or tabs simultaneously :)

discuss

No comments yet.