(no title)
BrandonSmithJ | 7 years ago
It looks like the VAE is just used to create a feature vector, so the main difference seems to be in the MDN-RNN - which is taking the place of the usual state/action simulation in Dyna-Q.
BrandonSmithJ | 7 years ago
It looks like the VAE is just used to create a feature vector, so the main difference seems to be in the MDN-RNN - which is taking the place of the usual state/action simulation in Dyna-Q.
Cybiote|7 years ago
The VAE learns a compressed vector and the latent variables are somewhat meaningful. The VAE can also be sampled from and is not just a table of memorized examples. The RNN maintains coherence with actions and observations of previous time-steps and a separate controller is also learned. The end result is their approach is richer and more flexible.