(no title)
Brystephor | 5 months ago
* we have a single optimization goal (e.g the metric to minimize or maximize). The hard part isnt defining a optimization goal. The hard part is identifying what stakeholders actually want the tradeoff between different metrics. If you have goal A and goal B, where B is more aggressive than A, then getting agreement on which is better is hard. This is a people problem.
* MAB seems to be a good proof of concept that something that can be optimized but isnt an "end game" optimization path.
* MAB for A/B testing is going to mess up your AB data and make everything more difficult. You should have a treatment that uses the MAB algorithm and a treatment that doesnt.
All of the above is for non contextual MAB. I am currently learning about different MAB algorithms although each of them are pretty similar. The ones ive read about are all effectively linear/logistic regression and tbe differences come from exploration mechanism and how uncertainty is represented. Epsilon greedy has no uncertainty, exploration just happens a fixed percentage of time. UCB is optimistic about uncertainty, and the amount of optimism controls exploration. Thompson sampling uses statistical (beta) distributions about uncertainty, exploration happens less as confidence about a particular set of options increases.
Overall its a fun area to work in that is quite different from your typical CRUD development which is a nice change of pace.
Iwan-Zotow|5 months ago
Brystephor|5 months ago