I gave up on R long ago. On python, pandas, numpy, scipy, scikit-learn and statsmodels are often enough to replace basic R functionality. In the case of needing an actual R package, I've found it's almost always worth the time to build a wrapper on rpy2, or use ipython notebook's rmagic extension.
On the note of building wrappers-- it's still a good idea to rpy2-wrap basic statistical tests and present in both ecosystems. The R functions are battle-tested and have been looked over by far more statisticians and mathematicians than their python counterparts (or so it seems).
I gave up on Python long ago. Anything I want to do in Python, I can do just as easily in R.
Rcpp provides a great environment for intermingling high performance c++ with expressive R code. R has all th r features you'd want for a modern development environment: good IDE, unit testing, documentation conventions, ... It's easy to turn analyses into interactive apps with shiny. There's a package for every model you can think of. You can connect to databases, you can talk to web apis and scrape web pages.
>The documentation is inanely bad. I can't explain it.
I'm surprised that the author is saying this as I've experienced exactly the opposite. R completely documents all the arguments and outputs of its functions, and documentation is easy to pull up by function, and this is almost universal both for distribution and community packages. Additionally the documentation often includes vignettes that show full examples.
In contrast, Python documentation is most often documented on long pages that mentions functions, but does not describe arguments or the output. I've found almost no Python documentation to be adequate, outside of some of the core functions. And when it is adequate, it's exceedingly verbose, and lacking in examples, basically the worst of all worlds.
I agree, I've read other gripes about R function documentation but it's one of the better ones for community software. Python's documentation seems focused on implementation from a programmer's perspective, but often not as helpful for actual application of the function.
"This also means that you shouldn't ever assign useful quantities to variables named T and F. Sorry. Other variable names that you cannot use are c, q, t (!), C, D, and I."
Note the contradiction of that limitation and the name of the language. Makes the name even more exceptional.
Is he right? What's with the scope? Can't I introduce a new T in my function thus just hiding the global one from it, but otherwise not disturbing anything? (I don't know R, I'm just asking, reading that the variables have the function scope)
Yes, all those single-letter names are just ordinary variables that you can overwrite. Doing so is nearly always a terrible idea.
The article is being a little unclear when it says "cannot use". You can use literally any variable name in R if you really want to. If the name you want is already a reserved word (e.g. "for", "else", "function"), or if it is not a syntactically valid token (e.g. '@!":%$>"@;'), then you just have to enclose it in backquotes. So the following is valid R:
If I ever strike it rich, I swear to god I'm donating $5,000,000 to the cause of reaching total feature parity between the best of R's packages and NumPy/SciPy.
you are underestimating the price of that by at least an order of magnitude. That is one of the biggest reasons people like R. R also has the notion of NA, which is different than NaN, built into the language.
The other huge reason for R adoption is it makes running stat analyses very simple, so for all the people who aren't programmers, and don't wish to be programmers, R is an awesome choice. The ability to, in 3 simple lines of R, load data from a csv, run a glm, and get a sophisticated report on the model is awesome.
The worst part of R is that array indices begin at 1, and trying to get an array at 0 will fail silently by returning 0. I've spent many a night trying to figure out why all my data is wrong because my_array[0] * frame$column is returning the wrong numbers.
this gets brought up again and again with no agreement and the only advice I know how to give you is that when working in any language that has mathematics as its primary focus (mathematica, matlab, Julia, R), you use 1-index, and other languages you use 0-index.
The worst point of R is debugging. R scripts often fail without telling any line number, even if script starts with an "options(error=traceback)". And I never seen line numbers of warnings.
So you get errors, crashes and warning, and the only way to debug R, is to inject message() statements all over the code.
[+] [-] nkrumm|11 years ago|reply
On the note of building wrappers-- it's still a good idea to rpy2-wrap basic statistical tests and present in both ecosystems. The R functions are battle-tested and have been looked over by far more statisticians and mathematicians than their python counterparts (or so it seems).
[+] [-] hadley|11 years ago|reply
Rcpp provides a great environment for intermingling high performance c++ with expressive R code. R has all th r features you'd want for a modern development environment: good IDE, unit testing, documentation conventions, ... It's easy to turn analyses into interactive apps with shiny. There's a package for every model you can think of. You can connect to databases, you can talk to web apis and scrape web pages.
[+] [-] epistasis|11 years ago|reply
I'm surprised that the author is saying this as I've experienced exactly the opposite. R completely documents all the arguments and outputs of its functions, and documentation is easy to pull up by function, and this is almost universal both for distribution and community packages. Additionally the documentation often includes vignettes that show full examples.
In contrast, Python documentation is most often documented on long pages that mentions functions, but does not describe arguments or the output. I've found almost no Python documentation to be adequate, outside of some of the core functions. And when it is adequate, it's exceedingly verbose, and lacking in examples, basically the worst of all worlds.
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] mapcar|11 years ago|reply
[+] [-] acqq|11 years ago|reply
http://tim-smith.us/arrgh/atomic.html
"This also means that you shouldn't ever assign useful quantities to variables named T and F. Sorry. Other variable names that you cannot use are c, q, t (!), C, D, and I."
Note the contradiction of that limitation and the name of the language. Makes the name even more exceptional.
Is he right? What's with the scope? Can't I introduce a new T in my function thus just hiding the global one from it, but otherwise not disturbing anything? (I don't know R, I'm just asking, reading that the variables have the function scope)
[+] [-] rcthompson|11 years ago|reply
The article is being a little unclear when it says "cannot use". You can use literally any variable name in R if you really want to. If the name you want is already a reserved word (e.g. "for", "else", "function"), or if it is not a syntactically valid token (e.g. '@!":%$>"@;'), then you just have to enclose it in backquotes. So the following is valid R:
[+] [-] 0942v8653|11 years ago|reply
[+] [-] hadley|11 years ago|reply
[+] [-] weissguy|11 years ago|reply
[+] [-] otoburb|11 years ago|reply
[+] [-] x0x0|11 years ago|reply
The other huge reason for R adoption is it makes running stat analyses very simple, so for all the people who aren't programmers, and don't wish to be programmers, R is an awesome choice. The ability to, in 3 simple lines of R, load data from a csv, run a glm, and get a sophisticated report on the model is awesome.
[+] [-] IndianAstronaut|11 years ago|reply
[+] [-] alilja|11 years ago|reply
[+] [-] princeb|11 years ago|reply
[+] [-] kephra|11 years ago|reply
So you get errors, crashes and warning, and the only way to debug R, is to inject message() statements all over the code.
[+] [-] hadley|11 years ago|reply
[+] [-] joyofdata|11 years ago|reply
The opposite is true in my experience.
> R makes me want to kick things almost every time I use it.
Maybe R is not your biggest problem.
> The documentation is inanely bad. I can't explain it.
Good point!