I think open-source eventually replaces commercial products, in the same way that proprietary products become commoditized. The response for commercial products is also the same: continual differentiation, adding new features, benefits, support, documentation etc. Exceptions are also the same: natural monopolies (e.g. strong network effects).
Open-source is great at hill-climbing, where there are clear directions for improvement and especially for features that are obviously needed by users (provided the structure of the project is sufficiently modular to facilitate it), by tapping the collective intelligence of users.
It's not great at "hill-hopping": originating radically different products.
Counter-examples abound. Can you name even one open source app that has displaced a mature, user-facing desktop app with a non-trivial UI, other than a web browser?
Open source only seems to win in domains in which it makes sense for companies to share work in order to compete at a higher tier of functionality.
I don't think it's obvious that open source displaces commercial for scientific computing. For every example like R which has in many places displaced S-Plus, there are counterexamples like matlab, for which the open source clone Octave is a bad joke, at least the last time I tried using it: missing functions, slowness, extreme difficulty installing; or Mathematica, or eviews, or gauss, or Maple.
One other potential factor: a lot of this software is driven by academic use, either because academics used it or that's where people were first exposed, and academics often receive large discounts.
Anecdotally, NumPy (Python) has some traction. Similarly they don't consider SQL libraries. And I'm sure there are statistical analysis libraries for Java. According to the bar chart below R is mentioned by 45%, SQL by 32%, Python by 25%, Java by 24%. This seems a more reasonable comparison to me than the graphs earlier (higher up) in the post.
I use R as my primary data-analysis tool for almost all of my work, with occasional recourse to SAS for certain specialized models (e.g., PROC GLIMMIX for generalized mixed models).
My only complaint is the awful default IDE, which can be mitigated to a large extent by scripting elsewhere and source()ing the script, and some odd edge behaviors including the mystifying row names of dataframes, the difficulty of dropping unused factor levels from aggregated or sliced data (another dataframe issue), and the perhaps unnecessary obscurity of some of the plotting functions (although holding R responsible for the lattice library is unfair).
All that said, for a free tool, it's extraordinary, and the authors of the base language and the many packages that I use have my gratitude.
I love R--but I end up using Stata more often because it is easier to produce vector graphics that can be imported to Illustrator. I wish that the R community would start to focus on graphics.
> Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to helping people learn R. Bob is a consulting statistician with 30 years of experience
Disclaimer: I hate R's syntax, but my company's analytics group uses R for just about everything.
Unfortunately, it's almost impossible to work with a very large datasets in R, because of the speed limitations. Many researchers I know use Matlab because of this.
It's probably more an issue of easily pre-filtering/aggregating the data before analysing it with R. I like this approach of moving the calculation to the data, but we must be very late on the adoption curve if Oracle are doing it already.
For statistical genetics at least, it's common to process much of the data in parallel, so the RAM limitations on one R instance are not the gating factor.
I use R every day for my research (doing social simulations sometimes based on sample surveys). An additional R limitation is the memory limit. R cannot use virtual memory and the maximum amount of data is limited.
There are two ways to deal with that, one is to load datasets through SQL database (using a SQL library) which IMHO is a "dirty hack". The other (what I usually do) is to load the huge datasets in STATA (or any other stats package) and filter the data to get a set that is small enough to work with R.
Other than that, the available libraries in R are crazy good. for example stuff like Approximate Bayesian Computation or survey analysis (considering weight factors) is straightforward with available libraries.
The core libraries available in R are some of the most well-reviewed, carefully written, and correct codes available.
There are a huge amount of available libraries (thousands!) of variable quality thanks to the open nature of the project. But commercial software has problems too, especially with new and niche products. And when something goes wrong in those cases, you can't see why for yourself. Worse, other independent experts would not have the chance to either.
Having worked on and off with SAS in recent years I'm aware it has its limitations, but round here we like constructive contributions please. Would you like to expand upon your remarks?
[+] [-] 6ren|14 years ago|reply
Open-source is great at hill-climbing, where there are clear directions for improvement and especially for features that are obviously needed by users (provided the structure of the project is sufficiently modular to facilitate it), by tapping the collective intelligence of users.
It's not great at "hill-hopping": originating radically different products.
[+] [-] cageface|14 years ago|reply
Open source only seems to win in domains in which it makes sense for companies to share work in order to compete at a higher tier of functionality.
[+] [-] dfc|14 years ago|reply
[+] [-] earl|14 years ago|reply
One other potential factor: a lot of this software is driven by academic use, either because academics used it or that's where people were first exposed, and academics often receive large discounts.
[+] [-] dj_axl|14 years ago|reply
https://sites.google.com/site/r4statistics/_/rsrc/1318535062...
[+] [-] UrbanPat|14 years ago|reply
[+] [-] dewarrn1|14 years ago|reply
My only complaint is the awful default IDE, which can be mitigated to a large extent by scripting elsewhere and source()ing the script, and some odd edge behaviors including the mystifying row names of dataframes, the difficulty of dropping unused factor levels from aggregated or sliced data (another dataframe issue), and the perhaps unnecessary obscurity of some of the plotting functions (although holding R responsible for the lattice library is unfair).
All that said, for a free tool, it's extraordinary, and the authors of the base language and the many packages that I use have my gratitude.
[+] [-] roxtar|14 years ago|reply
[+] [-] dennish00a|14 years ago|reply
[+] [-] jcdreads|14 years ago|reply
> Robert A. Muenchen is the author of R for SAS and SPSS Users and, with Joseph M. Hilbe, R for Stata Users. He is also the creator of r4stats.com, a popular web site devoted to helping people learn R. Bob is a consulting statistician with 30 years of experience
Disclaimer: I hate R's syntax, but my company's analytics group uses R for just about everything.
[+] [-] zzleeper|14 years ago|reply
[+] [-] migiale|14 years ago|reply
[+] [-] hvs|14 years ago|reply
[+] [-] tonyt|14 years ago|reply
http://radar.oreilly.com/2011/10/oracles-big-data-appliance....
It's probably more an issue of easily pre-filtering/aggregating the data before analysing it with R. I like this approach of moving the calculation to the data, but we must be very late on the adoption curve if Oracle are doing it already.
[+] [-] carbocation|14 years ago|reply
[+] [-] xtracto|14 years ago|reply
There are two ways to deal with that, one is to load datasets through SQL database (using a SQL library) which IMHO is a "dirty hack". The other (what I usually do) is to load the huge datasets in STATA (or any other stats package) and filter the data to get a set that is small enough to work with R.
Other than that, the available libraries in R are crazy good. for example stuff like Approximate Bayesian Computation or survey analysis (considering weight factors) is straightforward with available libraries.
[+] [-] aditya|14 years ago|reply
[+] [-] csmt|14 years ago|reply
[+] [-] SkyMarshal|14 years ago|reply
http://stackoverflow.com/tags/r/info
[+] [-] Arjuna|14 years ago|reply
[+] [-] roxtar|14 years ago|reply
[+] [-] ahalan|14 years ago|reply
[+] [-] jsavimbi|14 years ago|reply
[+] [-] lemming|14 years ago|reply
[+] [-] traveldotto1|14 years ago|reply
[+] [-] bwaynelewis|14 years ago|reply
There are a huge amount of available libraries (thousands!) of variable quality thanks to the open nature of the project. But commercial software has problems too, especially with new and niche products. And when something goes wrong in those cases, you can't see why for yourself. Worse, other independent experts would not have the chance to either.
[+] [-] burgerbrain|14 years ago|reply
[+] [-] dfc|14 years ago|reply
[+] [-] georgieporgie|14 years ago|reply
[+] [-] ginzasparrow|14 years ago|reply
[+] [-] eftpotrm|14 years ago|reply