top | item 9695502

Why has R, despite quirks, been so successful?

19 points| MichaelCORS | 10 years ago |blog.revolutionanalytics.com | reply

6 comments

order
[+] c3534l|10 years ago|reply
I think the real reason is that no matter what crazy machine learning idea you want to implement, you basically just import the data and pass it to a function. Bam. You just trained a convolutional neural net. Calculate the hamming distance? Sure. No idea what that is, but I'll throw it into a random forest with a couple other idea when I'm done, then plot the most important variables. You never really have to learn a new API, which I'm always doing in Python. I don't want to learn a whole new framework: I heard of a thing and I want to see what it does with my data. Python never lets things be as simple as:

    from machinelearning import svm
    with open('/home/me/programming/data.csv') as f:
        data = f.read()

    print(svm(data))
That's why R, which is an awful, buggy, and weird language, is so pleasant to use for stats and ML.
[+] bowyakka|10 years ago|reply
Not to poke holes in this but is this actually that hard ?

    import pandas as pd
    from sklearn.linear import SVC

    df = pd.read_csv('/home/me/programming/data.csv')
    y = df['label']
    X = df.drop('label', axis=1)

    clf = SVC()
    clf.fit(X, y)
... Most of the things have a similar API, except for when I veer off into say deep-learning land.

This is not to say that R is a bad language, or that R does not have equally nice API's for this stuff; but I feel that in your case its a familiarity of language thing.

[+] Lofkin|10 years ago|reply
You clearly haven't used pythons major data science packages.

Scikitlearn and statsmodels are known as being specifically easy to use with a uniform API that disparate R packages lack(though this is somewhat fixed with caret).

Further, for bayesian inference, Pymc 3 is more powerful and concise than training a model in stan.

[+] dthal|10 years ago|reply
I think the key is that R (and S before it) were designed from the start to provide a continuous path from being a user to being a programmer. Unlike real programming languages, you can get a lot of value out of R without really coding. Its a great statistical calculator and has good graphics. Then you can move relatively easily into scripting a few repetitive tasks, and then on into writing simple programs.

One consequence of that is that R has a lot of 'non-programmer programmers', statisticians and domain experts. Some of them write libraries that encode their domain knowledge. Then over time that adds up to having library support for more different types of analysis than any other language. I personally dislike R as a language, but I often end up using it because it has a library for some task that just doesn't exist in Python.

[+] tstactplsignore|10 years ago|reply
Just being an open source and free platform for performing statistical tests was probably enough to make R extremely successful when it first launched. At the time Python didn't have any widely used and extensible statistical libraries, and matlab/SASS cost a great deal of money and are difficult to deploy. R's growth since then is probably due to the fantastic packaging system.