top | item 45436352

(no title)

PanoptesYC | 5 months ago

Yes. The paper explains the basic model as so:

"We consider the basic model with IID rewards, called stochastic bandits. An algorithm has K possible actions to choose from, a.k.a. arms, and there are T rounds, for some known K and T . In each round, the algorithm chooses an arm and collects a reward for this arm. The algorithm’s goal is to maximize its total reward over the T rounds."

discuss

No comments yet.