top | item 20023571

(no title)

elehack | 6 years ago

Yes. Bandits will often converge more quickly to the optimal strategy, but it is much more difficult to understand why that strategy is optimal and generalize from the bandit outcomes to predict future performance and performance of other strategies.

It isn't impossible - bandits are seeing adoption in medical trials to avoid precisely the problem discussed - but the standard experiment design and analysis techniques you learn in a decent college statistics class or introductory statistics text no longer apply. That's one of the beauties of A/B testing: while it does require substantial thought to do well, the basic statistics of the setup are very well-understood at this point.

discuss

edmundsauto|6 years ago

But for results to generalize or to understand why, the confounders must be accounted for in the randomization. This is really hard to do well -- there are often subtle influences that aren't sufficiently understood how they impact these non-linear systems. What makes someone convert? A million different factors; changing the color of a button in one context doesn't necessarily tell me much about how people would respond to that experience in another context.

It's easy to underestimate how complex things are, because we only see some superficial aspects of e.g. a user/software interaction model. This flaw is down to how our brains work -- ref "What you see is all there is".

orasis|6 years ago

I disagree. I’ve spent a lot of time staring at bandit outcomes and usually they match some sort of intuition of why a variant might be exceptional.

comicjk|6 years ago

That could be post-hoc reasoning, though. It would be interesting to pre-register your hypotheses, or see whether you could tell bandit outcomes from random ones.