top | item 39253230 (no title) eggie5 | 2 years ago if you have an NN that is probabilistic, how do you update the prior after sampling from the posterior? discuss order hn newest gwern|2 years ago You take the action which you computed to be optimal under the hypothetical of your posterior sample; this then yields a new observation. You add that to the dataset, and train a new NN. eggie5|2 years ago ah, so observe the reward and then take a gradient step load replies (1)
gwern|2 years ago You take the action which you computed to be optimal under the hypothetical of your posterior sample; this then yields a new observation. You add that to the dataset, and train a new NN. eggie5|2 years ago ah, so observe the reward and then take a gradient step load replies (1)
gwern|2 years ago
eggie5|2 years ago