top | item 10821045

Most commonly used statistical tests and implementation in R

88 points| nafizh | 10 years ago |r-statistics.co | reply

32 comments

order
[+] IndianAstronaut|10 years ago|reply
Shapiro Wilk isn't all that useful with practical data unless your sample sizes are fairly small. Once you deal with anything above 5000 values, you are better off with QQ plots.
[+] ekianjo|10 years ago|reply
> If the p-Value is less than significance level (ideally 0.05),

Erm, no. P=0.05 is borderline meaningless, there could as much as 30% chance you are wrong about the actual difference being there depending on the true probability of the initial hypothesis.

P-values should be used with strong caution.

[+] snydly|10 years ago|reply
> P-values should be used with strong caution.

FiveThirtyEight (and Scientific American, and others) did some pretty interesting articles about this recently if you haven't seen it:

http://fivethirtyeight.com/features/science-isnt-broken/

Just from personal experience, the use of p-values is really broken in biology/chemistry. The things I've heard principal investigators say...

[+] cetacea|10 years ago|reply
Even better, p-values should not be used at all. If I have data in hand, I want to use it to find out the probability that my hypothesis is true. But p-value analysis requires me to instead ask a different question that I don't really care about, involving whether my data are consistent with the null hypothesis.

Everything is just so much more sensible if you allow yourself to assign probabilities to hypotheses, rather than assuming a hypothesis from the outset and computing opaque statistics relating to your data.

[+] GFK_of_xmaspast|10 years ago|reply
> could as much as 30% chance you are wrong about the actual difference being there depending on the true probability of the initial hypothesis.

I'm having trouble parsing this, are you talking about the power of the test?

[+] minimaxir|10 years ago|reply
It's also worth looking at the documentation in R for each of the functions too. (can invoke with console with ?chisq.test for example).

For example, the chisq.test has optional built-in Monte Carlo testing, and none of the other functions do, oddly.

[+] cloakanddagger|10 years ago|reply
This is a great post! Bookmarking this for future reference.
[+] hackaflocka|10 years ago|reply
This is a good resource for those new to R.

R has some really good GUI layers now. I struggled and struggled for years trying to learn the command line methods, but it was too much for me. The following do a great job (these are alternatives)

- Deducer

- R Commander

- RKWard

[+] earino|10 years ago|reply
It seems like this list is incomplete without mentioning that both RStudio[1] and Jupyter[2] notebooks now have really first class support for R. There are also two upstatrs, Rodeo[3] and Beaker[4] are doing cool stuff as well.

The company I work for, Domino Data Lab[5], let's you fire up a lot of these notebooks in a nice hosted environment on big cloud servers with minimal cost and effort. It's a fun way to learn how all these new environments can work together. From RStudio for exploratory analysis, to Jupyter notebooks for presenting a topic. The other two I haven't really found the superior use-case. The tools in this space are just getting better and better.

1. https://www.rstudio.com/ 2. http://jupyter.org/ 3. http://blog.yhat.com/posts/introducing-rodeo.html 4. http://beakernotebook.com/ 5. https://www.dominodatalab.com/

[+] pfh|10 years ago|reply

[deleted]