top | item 38747260

(no title)

plants | 2 years ago

Specifically for A/B or A/B/N testing, you can use a beta-bernoulli bandits, which give you confidence about which experience is best and will converge to an optimal experience faster than your standard hypothesis test. Challenges are that you have to frequently recompute which experience is best and thus, dynamically reallocate your traffic. They also only works on a single metric, so if your overall evaluation criterion isn’t just something like “clickthrough rate”, this type of testing becomes more difficult (if anyone else knows how multiple competing metrics are optimized with bandits, feel free to chime in).

discuss

abhgh|2 years ago

Beta-Bernoulli multi-armed bandits (BB-MAB) are definitely a good way to get started on a Bayesian version of A/B testing where you've the additional benefit that your population is dynamically allocated to the most performant option (actually this dynamic allocation makes it similar to interim analysis [1] rather than vanilla A/B testing).

There are some caveats though - and I mention these from the experience of running such solutions on a large scale in production. First, BB-MAB can't adapt to context by design. They only look at click/no-click behavior across the population. So, if your population has two distinct segments - youth and elderly - who behave very differently wrt purchases, the BB-MAB won't pick a different winning advt. per group; its blind to these groups.

The solution is to use something like a contextual MAB - which assimilates user features (or whatever you might throw at it) into the MAB. There are simple ways to adapt simple MABs to the contextual setup [2] (in my experience, these can also be effective) but, of course, the literature in this area is wide and deep.

A second caveat is that if the ratio of the size of the pool of advts. to the number of impressions is high, the BB-MAB won't converge or converge to a good optima; the search space is simply too large relative to the data. In cases like this it becomes important to begin with the right Beta priors, instead of the standard recipe of starting with a Beta that looks like a uniform distribution.

[1] https://en.wikipedia.org/wiki/Interim_analysis

[2] https://arxiv.org/abs/1811.04383