(no title)
thegginthesky | 1 year ago
At my company we have very time sensitive AB tests that we have to run with very few data points (at most 20 conversions per week, after 1000 or so failures).
We found out that using Bayesian A/B testing was excellent for our needs as it could be run with fewer data points than regular AB for the sort of conversion changes we aim for. It gives a probability of group B converting better than A, and we can run checks to see if we should stop the test.
Regular ABs would take too long and the significance of the test wouldnt make much sense because after a few weeks we would be comparing apples to oranges.
e10v_me|1 year ago
Most probably, in your case, higher sensitivity (or power) comes at the cost of higher type I error rate. And this might be fine. Sometimes making more changes and faster is more important than false positives. In this case, you can just use a higher p-value threshold in the NHST framework.
You might argue that the discrete type I error does not concern you. And that the potential loss in metric value is what matters. This might be true in your setting. But in real life scenarios, in most cases, there are additional costs that are not taken into account in the proposed solution: increased complexity, more time spent on development, implementation, and maintenance.
I suggest reading this old post by David Robinson: https://varianceexplained.org/r/bayesian-ab-testing/
While the approach might fit in your setting, I don't believe most of other users of tea-tasting would benefit from it. For the moment, I must decline your kind contribution.
But you still can use tea-tasting and perform the calculations described in the whitepaper. See the guide on how to define a custom metric with a statistical test of your choice: https://tea-tasting.e10v.me/custom-metrics/