top | item 47047130

(no title)

sdenton4 | 12 days ago

You're measuring binary outcomes, so you can use a beta distribution to understand the distribution of possible success rates given your observations, and thereby provide a confidence interval on the observed success rates. This week help us see whether that 4% success rate is statistically significant, or if it is likely to be noise.

discuss

bee_rider|12 days ago

I’ve only ever gotten, like, slight wording suggestions from reviewers. I wish they would write things like this instead—it is possibly meaningful and eminently do-able (doesn’t even require new data!).

sdenton4|12 days ago

Taking a slightly closer look at the paper, you've got K repositories and create a set of test cases within each repository, totaling 130-ish tests. There may be some 'repository-level' effects - ie, tasks may be easier in some repo's than others.

Modeling the overall success rate then requires some hierarchical modeling. You can consider each repository as a weighted coin, and each test within a repository as flip of that particular coin. You want to estimate the overall probability of getting heads, when choosing a coin at random and then flipping it.

Here's some Gemini hints on how to proceed with getting the confidence interval using hierarchical bayes: https://gemini.google.com/corp/app/e9de6a12becc57f6

(Still no need for further data!)