top | item 37492323

(no title)

grega5 | 2 years ago

First, you really should move away from frequentist statistical testing and use Bayesian statistics instead. It is perfect for such occasions where you want to adjust your beliefs in what UX is best based on empirical data to support your decision. With collecting data you are increasing confidence in your decision rather than trying to meet an arbitrary criterion of a specific p-value.

Second, the “run-in-parallel” approach has a well defined name in experimental design, called a factorial design. The diagram shown is an example of full factorial design in which each level of each factor is combined with each level of all other factors. The advantage of such design is that interactions between factors can be tested as well. If there are good reasons to believe that there are no interactions between the different factors then you could use a partial factorial design that, which has the advantage of having less total combinations of levels while still enabling estimation of effects of individual factors.

discuss

scottfr|2 years ago

Disagree on using Bayesian statistics. Frequentist statistics are perfect for A/B testing.

There are so many strong biases people have about different parts about UI/UX. One of the significant benefits of A/B testing is that it lets you move ahead as a team and make decisions even when there are strongly differing opinions on your team. In these cases you can just "A/B test" and let the data decide.

But if you are using Bayesian approaches you'll transition those internal arguments to what the prior should be and it will be harder to get alignment based on the data.

eru|2 years ago

Not necessarily.

You can present your Bayesian approaches in such a way that it's almost independent of the prior. Your output will be 'this experiment should shift your odds-ratio by so-and-so-many logits in this or that direction' instead of an absolute probability.

JHonaker|2 years ago

You have to make almost the exact same choices when you rely on fequentist tools. The main difference is they’re pre-made during the development of the tool, so you don’t get insight into what they are without studying the theory behind the test.

grega5|2 years ago

We can agree to disagree. My claim is actually quite the reverse. For A/B testing specifically, Bayes is much better suited to address the practical questions you would usually have when running A/B experiments. See my response to AlexeyMK below.

miksumiksu|2 years ago

Fixing dysfunctional decision making by delegating it to "data", what could go wrong? Might as well flip a coin and save the money.

AlexeyMK|2 years ago

Thanks for factorial design! I'll update the post to the proper nomenclature.

The frequentist/bayesian debate is not one I understand well enough to opine - do you have any reading you'd recommend for this topic?

grega5|2 years ago

I myself am a rather recent convert to using Bayesian statistic, for the simple reason, that I was trained and have used frequentist statistics extensively in the past and I had no experience using Bayesian statistics. Once you take the time to master the basic tools, it becomes quite straightforward to use. I am currently away from my computer and resources, which makes it difficult to suggest them. As a somewhat shameless plug, you could check the https://www.frontiersin.org/articles/10.3389/fpsyg.2020.0094... paper and the related R-package https://cran.r-project.org/web/packages/bayes4psy/index.html and GitHub repository https://github.com/bstatcomp/bayes4psy, which were made to be accessible to users with frequentist statistics experience.

To brutaly simplify the distinction. Using frequentist statistics and testing, you are addressing the question, whether based on the results, you can reject the hypothesis that there is no difference between two conditions (e.g., A and B in A/B testing). The p-value broadly gives you the probability that the data from A and B are sampled from the same distribution. If this is really low, then you can reject the null hypothesis and claim that there are statistically significant differences between the two conditions.

In comparison, using Bayes statistic, you can estimate the pobability of a specific hypothesis. E.g. the hypothesis that A is better than B. You start with a prior belief (prior) in your hypothesis and then compute the posterior probability, which is the prior adjusted for the additional empirical evidence that you have collected. The results that you get can help you address a number of questions. For instance, (i) what is the probability that in general A leads to better results than B. Or related (but substantially different), (ii) what is the probability that for any specific case using A you have a higher chance of success than using B. To illustrate the difference, the probability that men in general are taller than women approaches 100%. However, if you randomly pick a man and a woman, the probability that the man will be higher than the woman is substantially lower.

In your A/B testing, if the cost of A is higher, addressing the question (ii) would be more informative than question (i). You can be quite sure that A is in general better than B, however, is the difference big enough to offset the higher cost?

Related to that, in Bayes statistics, you can define the Region of Practical Equivalence (ROPE) - in short the difference between A and B that could be due to measurment error, or that would be in practice of no use. You can then check in what proportion of cases, the difference would fall within ROPE. If the proportion of cases is high enough (e.g. 90%) then you can conclude that in practice it makes no difference whether you use A or B. In frequentist terms, Bayes allows you to confirm a null hypothesis, something that is impossible using frequentist statistic.

In regards to priors - which another person has mentioned - if you do not have specific reason to believe beforehand that A might be better than B or vice versa, you can use a relatively uninformative prior, basically saying, “I don’t really have a clue, which might be better”. So issue of priors should not discourage you to using Bayes statistics.