How the R-project is taking over statistical analysis software

[+] 6ren|14 years ago|reply

I think open-source eventually replaces commercial products, in the same way that proprietary products become commoditized. The response for commercial products is also the same: continual differentiation, adding new features, benefits, support, documentation etc. Exceptions are also the same: natural monopolies (e.g. strong network effects).

Open-source is great at hill-climbing, where there are clear directions for improvement and especially for features that are obviously needed by users (provided the structure of the project is sufficiently modular to facilitate it), by tapping the collective intelligence of users.

It's not great at "hill-hopping": originating radically different products.

[+] cageface|14 years ago|reply

Counter-examples abound. Can you name even one open source app that has displaced a mature, user-facing desktop app with a non-trivial UI, other than a web browser?

Open source only seems to win in domains in which it makes sense for companies to share work in order to compete at a higher tier of functionality.

[+] dfc|14 years ago|reply

I do not think this applies to office suites? OpenOffice/abiword v. Ms Office/Pages or whatever it is? I

[+] earl|14 years ago|reply

I don't think it's obvious that open source displaces commercial for scientific computing. For every example like R which has in many places displaced S-Plus, there are counterexamples like matlab, for which the open source clone Octave is a bad joke, at least the last time I tried using it: missing functions, slowness, extreme difficulty installing; or Mathematica, or eviews, or gauss, or Maple.

One other potential factor: a lot of this software is driven by academic use, either because academics used it or that's where people were first exposed, and academics often receive large discounts.

[+] dj_axl|14 years ago|reply

Anecdotally, NumPy (Python) has some traction. Similarly they don't consider SQL libraries. And I'm sure there are statistical analysis libraries for Java. According to the bar chart below R is mentioned by 45%, SQL by 32%, Python by 25%, Java by 24%. This seems a more reasonable comparison to me than the graphs earlier (higher up) in the post.

https://sites.google.com/site/r4statistics/_/rsrc/1318535062...

[+] UrbanPat|14 years ago|reply

What do you mean by "SQL Libraries"? Do these interface with SQL to perform analysis?

[+] dewarrn1|14 years ago|reply

I use R as my primary data-analysis tool for almost all of my work, with occasional recourse to SAS for certain specialized models (e.g., PROC GLIMMIX for generalized mixed models).

My only complaint is the awful default IDE, which can be mitigated to a large extent by scripting elsewhere and source()ing the script, and some odd edge behaviors including the mystifying row names of dataframes, the difficulty of dropping unused factor levels from aggregated or sliced data (another dataframe issue), and the perhaps unnecessary obscurity of some of the plotting functions (although holding R responsible for the lattice library is unfair).

All that said, for a free tool, it's extraordinary, and the authors of the base language and the many packages that I use have my gratitude.

[+] roxtar|14 years ago|reply

Default IDE? Do you mean the R interpreter REPL? If you are looking for a nice IDE for R, I would suggest RStudio: http://rstudio.org/

[+] dennish00a|14 years ago|reply

I love R--but I end up using Stata more often because it is easier to produce vector graphics that can be imported to Illustrator. I wish that the R community would start to focus on graphics.

[+] jcdreads|14 years ago|reply

Probably worth noting about the author:

> Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to helping people learn R. Bob is a consulting statistician with 30 years of experience

Disclaimer: I hate R's syntax, but my company's analytics group uses R for just about everything.

[+] zzleeper|14 years ago|reply

I started learning SAS (I mostly use stata/matlab/python for my daily needs) but also ended up abhorring some parts of the syntax..

[+] migiale|14 years ago|reply

Unfortunately, it's almost impossible to work with a very large datasets in R, because of the speed limitations. Many researchers I know use Matlab because of this.

[+] hvs|14 years ago|reply

What about Octave? Other than my use in the Stanford Machine Learning class, I've never really used either, so I don't have any basis for comparison.

[+] tonyt|14 years ago|reply

It can't be that bad, Oracle are shipping it in their new Big Data Appliance.

http://radar.oreilly.com/2011/10/oracles-big-data-appliance....

It's probably more an issue of easily pre-filtering/aggregating the data before analysing it with R. I like this approach of moving the calculation to the data, but we must be very late on the adoption curve if Oracle are doing it already.

[+] carbocation|14 years ago|reply

For statistical genetics at least, it's common to process much of the data in parallel, so the RAM limitations on one R instance are not the gating factor.

[+] xtracto|14 years ago|reply

I use R every day for my research (doing social simulations sometimes based on sample surveys). An additional R limitation is the memory limit. R cannot use virtual memory and the maximum amount of data is limited.

There are two ways to deal with that, one is to load datasets through SQL database (using a SQL library) which IMHO is a "dirty hack". The other (what I usually do) is to load the huge datasets in STATA (or any other stats package) and filter the data to get a set that is small enough to work with R.

Other than that, the available libraries in R are crazy good. for example stuff like Approximate Bayesian Computation or survey analysis (considering weight factors) is straightforward with available libraries.

[+] aditya|14 years ago|reply

Can someone point me to a good introduction/resources to R? Especially for web stuff?

[+] csmt|14 years ago|reply

R in Nutshell is pretty good book: http://shop.oreilly.com/product/9780596801717.do

[+] SkyMarshal|14 years ago|reply

http://www.reddit.com/r/rstats

http://stackoverflow.com/tags/r/info

[+] Arjuna|14 years ago|reply

http://cran.r-project.org/other-docs.html

[+] roxtar|14 years ago|reply

I like this site: http://www.statmethods.net/

[+] ahalan|14 years ago|reply

http://www.quora.com/What-are-essential-references-for-R

[+] jsavimbi|14 years ago|reply

I keep an eye on R-bloggers: http://www.r-bloggers.com/

[+] lemming|14 years ago|reply

Moderately related - has anyone who previously used R in a serious way switched over to Incanter? Is Incanter comparably powerful?

[+] traveldotto1|14 years ago|reply

love R.. but have to say because it's open source, you do have to watch for the quality of libraries

[+] bwaynelewis|14 years ago|reply

The core libraries available in R are some of the most well-reviewed, carefully written, and correct codes available.

There are a huge amount of available libraries (thousands!) of variable quality thanks to the open nature of the project. But commercial software has problems too, especially with new and niche products. And when something goes wrong in those cases, you can't see why for yourself. Worse, other independent experts would not have the chance to either.

[+] burgerbrain|14 years ago|reply

Surely that would still be the case under any license.

[+] dfc|14 years ago|reply

If R did a little more hand holding it would be awesome.

[+] georgieporgie|14 years ago|reply

Looks interesting, but that page renders only on the right half of my Android screen, and can't be zoomed or reflowed.

[+] ginzasparrow|14 years ago|reply

DIE SAS DIE

[+] eftpotrm|14 years ago|reply

Having worked on and off with SAS in recent years I'm aware it has its limitations, but round here we like constructive contributions please. Would you like to expand upon your remarks?

60 comments