Shapiro Wilk isn't all that useful with practical data unless your sample sizes are fairly small. Once you deal with anything above 5000 values, you are better off with QQ plots.
> If the p-Value is less than significance level (ideally 0.05),
Erm, no. P=0.05 is borderline meaningless, there could as much as 30% chance you are wrong about the actual difference being there depending on the true probability of the initial hypothesis.
Even better, p-values should not be used at all. If I have data in hand, I want to use it to find out the probability that my hypothesis is true. But p-value analysis requires me to instead ask a different question that I don't really care about, involving whether my data are consistent with the null hypothesis.
Everything is just so much more sensible if you allow yourself to assign probabilities to hypotheses, rather than assuming a hypothesis from the outset and computing opaque statistics relating to your data.
R has some really good GUI layers now. I struggled and struggled for years trying to learn the command line methods, but it was too much for me. The following do a great job (these are alternatives)
It seems like this list is incomplete without mentioning that both RStudio[1] and Jupyter[2] notebooks now have really first class support for R. There are also two upstatrs, Rodeo[3] and Beaker[4] are doing cool stuff as well.
The company I work for, Domino Data Lab[5], let's you fire up a lot of these notebooks in a nice hosted environment on big cloud servers with minimal cost and effort. It's a fun way to learn how all these new environments can work together. From RStudio for exploratory analysis, to Jupyter notebooks for presenting a topic. The other two I haven't really found the superior use-case. The tools in this space are just getting better and better.
[+] [-] IndianAstronaut|10 years ago|reply
[+] [-] ekianjo|10 years ago|reply
Erm, no. P=0.05 is borderline meaningless, there could as much as 30% chance you are wrong about the actual difference being there depending on the true probability of the initial hypothesis.
P-values should be used with strong caution.
[+] [-] snydly|10 years ago|reply
FiveThirtyEight (and Scientific American, and others) did some pretty interesting articles about this recently if you haven't seen it:
http://fivethirtyeight.com/features/science-isnt-broken/
Just from personal experience, the use of p-values is really broken in biology/chemistry. The things I've heard principal investigators say...
[+] [-] cetacea|10 years ago|reply
Everything is just so much more sensible if you allow yourself to assign probabilities to hypotheses, rather than assuming a hypothesis from the outset and computing opaque statistics relating to your data.
[+] [-] GFK_of_xmaspast|10 years ago|reply
I'm having trouble parsing this, are you talking about the power of the test?
[+] [-] minimaxir|10 years ago|reply
For example, the chisq.test has optional built-in Monte Carlo testing, and none of the other functions do, oddly.
[+] [-] cloakanddagger|10 years ago|reply
[+] [-] hackaflocka|10 years ago|reply
R has some really good GUI layers now. I struggled and struggled for years trying to learn the command line methods, but it was too much for me. The following do a great job (these are alternatives)
- Deducer
- R Commander
- RKWard
[+] [-] earino|10 years ago|reply
The company I work for, Domino Data Lab[5], let's you fire up a lot of these notebooks in a nice hosted environment on big cloud servers with minimal cost and effort. It's a fun way to learn how all these new environments can work together. From RStudio for exploratory analysis, to Jupyter notebooks for presenting a topic. The other two I haven't really found the superior use-case. The tools in this space are just getting better and better.
1. https://www.rstudio.com/ 2. http://jupyter.org/ 3. http://blog.yhat.com/posts/introducing-rodeo.html 4. http://beakernotebook.com/ 5. https://www.dominodatalab.com/
[+] [-] pfh|10 years ago|reply
[deleted]