top | item 39253230

(no title)

eggie5 | 2 years ago

if you have an NN that is probabilistic, how do you update the prior after sampling from the posterior?

discuss

order

gwern|2 years ago

You take the action which you computed to be optimal under the hypothetical of your posterior sample; this then yields a new observation. You add that to the dataset, and train a new NN.

eggie5|2 years ago

ah, so observe the reward and then take a gradient step