top | item 16860826

(no title)

Is this similar to Dyna-Q learning, but with modeling/simulation being handled by the RNN?

It looks like the VAE is just used to create a feature vector, so the main difference seems to be in the MDN-RNN - which is taking the place of the usual state/action simulation in Dyna-Q.

discuss

Cybiote|7 years ago

Yeah, it's the same general principle of using a model to cheaply speed up policy learning. An advantage to their approach however, is that it learns a latent space and generalizes better.

The VAE learns a compressed vector and the latent variables are somewhat meaningful. The VAE can also be sampled from and is not just a table of memorized examples. The RNN maintains coherence with actions and observations of previous time-steps and a separate controller is also learned. The end result is their approach is richer and more flexible.