(no title)
plants
|
2 years ago
Specifically for A/B or A/B/N testing, you can use a beta-bernoulli bandits, which give you confidence about which experience is best and will converge to an optimal experience faster than your standard hypothesis test. Challenges are that you have to frequently recompute which experience is best and thus, dynamically reallocate your traffic. They also only works on a single metric, so if your overall evaluation criterion isn’t just something like “clickthrough rate”, this type of testing becomes more difficult (if anyone else knows how multiple competing metrics are optimized with bandits, feel free to chime in).
abhgh|2 years ago
There are some caveats though - and I mention these from the experience of running such solutions on a large scale in production. First, BB-MAB can't adapt to context by design. They only look at click/no-click behavior across the population. So, if your population has two distinct segments - youth and elderly - who behave very differently wrt purchases, the BB-MAB won't pick a different winning advt. per group; its blind to these groups.
The solution is to use something like a contextual MAB - which assimilates user features (or whatever you might throw at it) into the MAB. There are simple ways to adapt simple MABs to the contextual setup [2] (in my experience, these can also be effective) but, of course, the literature in this area is wide and deep.
A second caveat is that if the ratio of the size of the pool of advts. to the number of impressions is high, the BB-MAB won't converge or converge to a good optima; the search space is simply too large relative to the data. In cases like this it becomes important to begin with the right Beta priors, instead of the standard recipe of starting with a Beta that looks like a uniform distribution.
[1] https://en.wikipedia.org/wiki/Interim_analysis
[2] https://arxiv.org/abs/1811.04383