top | item 46738031

(no title)

when i took stats in high school our final had one set of data and two questions. one was to use that set of data to prove a hypothesis, and the other to use that same set of data to disprove it. the trick was manipulating confidence, and the real trick was realizing that this was almost certainly happening any time we allow someone else to make the critical decisions for how to crunch any particular dataset. there seems to be an unspoken translation layer: you ask a question in english, then use statistics as the process of translating that question from english to math, then apply the translated question to the dataset and get an answer. if translation were mechanical that would be fine, but often the english has to go through interpretation. How confident do we have to be in a "yes" answer? 99%? 95%? 50/50? What constitutes an outlier? Sometimes a very slight shift in the line between good and bad data will include or exclude a single data point that can shift the calculations. The problem seems to be standardization of this translation layer as much as anything else, which is to say that everyone is ostensibly trying to answer the same question but under the hood they're asking very different questions of the data.

discuss

No comments yet.