top | item 6761027

Ask HN: Python or R?

14 points| washedup | 12 years ago | reply

What's your preference, and why?

More specifically, comparing R to Python library.

9 comments

order
[+] svjunkie|12 years ago|reply
What's the problem you're trying to solve? It's hard for anyone to give useful advice without any more context.

That said, I recommend whichever language is easiest for you. I use R and have not fully learned Python, so I have an obvious bias. If you're performing complicated statistical analysis, I'd recommend R, but for more traditional programming, I get the impression that Python interfaces more efficiently with other languages.

[+] stadeschuldt|12 years ago|reply
I use both. I like R for its charting capability and the sheer amount of packages for different use cases. I use Python to pre-process data and get it into because it is a lot easier than in R. Also Scikit-learn, NumPy and Pandas are really nice.
[+] xixi77|12 years ago|reply
As everyone says, depends on what you are trying to do.

For all interactive/exploratory analysis, for statistical graphics, for more advanced statistics, for most statistics-related research work in general, I would definitely pick R out of these two.

If statistics is only a small part of the application, if you already know exactly what you have to do (i.e. no data exploration), if you have to do a lot of web/text processing -- probably Python.

Also, check which one has more/better packages related to what you are doing.

For some stats projects I would go with something else entirely though.

[+] code_scrapping|12 years ago|reply
They're not really directly comparable. Python is an general purpose programming language, R is a statistical processing tool.

You could compare R to Matlab, or R to python library with similar scope (numpy or pandas).

[+] xixi77|12 years ago|reply
I'm not sure that's the distinction -- there is no shortage of general-purpose programming code written in R, as well as statistics done in Python.

It's more about use cases -- is it all about statistics, or is actual statistics only a small part? Is the focus primarily on research and exploration, or on implementation and deployment?

[+] floppydisk|12 years ago|reply
It really all boils down to what problem you're trying to solve, what kind of analysis you're trying to do, and what the performance requirements are. For basic stats, R and Python will be comparable in terms of library availability and functionality. If you start getting into more specialized and/or esoteric statistics, you will find more R packages (libraries) than you will Python libraries.
[+] bjoerns|12 years ago|reply
I use both for my data analysis (though I'm not doing anything too fancy these days). R is great at statistics (that's what it was designed for) but a bit of a terrible programming language. So I combine the two, do all the IO stuff etc in Python and run the actual statistical analysis in R (RPy is a great Python interface to R).
[+] bjoerns|12 years ago|reply
...although even at the risk of getting laughed at, Excel is an awesome tool for a lot of quick and dirty data analysis stuff (depending on what you do). Throw in a little Python/R (e.g. via datanitro) and it's even more powerful.