top | item 24773150

(no title)

gajomi | 5 years ago

It seems to me that they are basically describing a variational formulation of the "optimization perspective" of reinforcement learning, which is cool, but I am confused... where is the supervised learning? Like what is the input and what is the output?

discuss

bnegreve|5 years ago

The way I understand it, the two subproblems are supervised in the sense that they are trained using data sampled from a fixed distribution, instead of data sampled from a distribution that changes as you update your model, as it is usually the case in RL. This makes the training more stable.

jonnycomputer|5 years ago

Thanks for clarifying that point.

Cmmn_Dscndnt|5 years ago

It seems more as if the authors are abusing terms from Machine Learning like "Supervised Learning".

jonnycomputer|5 years ago

abusing how?