I've been using R professionally for 4 years. I've been a Python programmer for 6 years. I've shipped large scale applications built on both languages to production.
Findings:
I honestly don't get the hype around R's plotting capabilities. ggplot2 is nice, but far too magical to be easily understood by beginners.
Python offers more intuitive memory management than R (which is really more a statement on how bad R is at memory efficiency).
R blows Python out of the water when it comes to expressing concise and readable linear algebra/stats computation. Done right, base R looks like mathematical pseudo-code.
R's non-standard evaluation is an under-appreciated killer feature. I've seen complex and powerful DSLs implemented implemented in base R. Python has some support for this kind of thing, but nowhere near the level of R.
I tried the Tidyverse and found it to be too magical and unstable. In general, the R package ecosystem is weird. Look at Stack Overflow and you'll find people touting all sorts of different packages to do simple operations. Needing to download a million random packages from grad students of the internet is a recipe for disaster.
But I want ggplot to do magic. Rendering things is extremely tedious and boring. I don't want to spend hours on stackoverflow searching for the special obscure incantations that will shift the legend and margins this way because matplotlib did an ugly job while constantly adapting boilerplate back and forth between the declarative and object-oriented API, I want it to just do what I mean!
Is there any coincidence that there are literally tens of different (and concurrent) rendering libraries for Python while the R world more or less settled for ggplot2? Writing matplotlib is such a pain in the ass that people go to great lengths to actually avoid using matplotlib (while still not having the expressivity, features and ease of use of ggplot2).
Magic is bad when you're shipping production code that's shared among multiple people who have to then spend a lot of time assimilating the mental model of the magic. It's perfect when you just want to plot stuff and draw nice figures.
Same and I have to disagree with basically everything you’ve said, save the DSL bit and the R package bit.
In particular, I don’t see how the tidyverse is at all ‘magical’ - take the two most popular tidy libs, dplyr and ggplot2. The api for dplyr is very explicit, intuitive, and based on long-standing precedence (sql). The api for ggplot2 is admittedly less intuitive, but is itself an implementation of widely known framework (Grammar of Graphics). If by magical you mean in their abstractions, those are about as far from magical as one can get.
The tidyverse does get iffy sometimes when you need to dig into the rlang/tidy eval area, but that’s all well-documented.
I haven’t had to use R in a production environment for about a year now, but I always enjoyed the concise tidy api.
Base R is a total mess and largely inconsistent, but I guess that’s what you get when statisticians from different uni’s patchwork a language in their spare time.
> Python offers more intuitive memory management than R (which is really more a statement on how bad R is at memory efficiency).
I haven't found that to be the case. Loading a 2gb CSV file when you have 8gb of ram is touch and go with pandas but with data.table it’s a breeze not to mention operations once loaded up are a fair-bit faster. The pythong version has recently been released and already provides a serious speed bump over pandas not to mention out the box memory-mapping for huge files. I recommend checking it out if you haven't heard of it https://h2oai.github.io/db-benchmark/
The Tidyverse is under development and it can be tough to keep up with the changes or know where to start (e.g., plyr vs dplyr, melt vs pivot_longer). However, ggplot2 is based on a pretty solid theory of graphics (Bertin's visual variables and Wilkinson's grammar of graphics). These can take some time to get into but it's not magic once you get the idea that you're mapping data variables to visual variables.
> Needing to download a million random packages from grad students of the internet is a recipe for disaster.
This is baloney. You don't need to rely on a million R packages. You really only need a good plotting library (the built-in one is fine, or use ggplot), a good data frame library (dplyr, data.table, or use build-in one), and a few extras to handle some weird rough edges (lubridate, forcats).
This takes you to 90% to your analysis goals.
Only novices are 'using a million random packages', which is probably the case for python as well.
> R blows Python out of the water when it comes to expressing concise and readable linear algebra/stats computation. Done right, base R looks like mathematical pseudo-code.
Are you including Numpy as part of Python in this statement? I've found Numpy to be as mathematically expressive as R, although I'm not a R power-user.
I've been a Python lover for decades. I dislike a lot of characteristics of the R language, but the ggplot2 package is superior to anything in Python visualization space. It is really excellent.
Wickham was really on to something with ggplot and it's graphical grammar concept for plots. That and Tidyverse in general, I believe, has saved R from irrelevance.
I wish more projects would use that way of thinking but it seems that in the jupyter/julia/python world there's too many choices and they all attempt a "kitchen-sink" approach for visualization.
I was honestly blown out of the water discovering ggplot2 after years of matplotlib, and a bit infuriated that I didn't make the effort to get into R sooner.
I love Altair but the practice of encoding data alongside the plots makes it unwieldy for sharing jupyter notebooks (by default a 5000 row limit). Github also fails to preview Altair plots in .ipynb right now
> The Lets-Plot for Python library includes a native backend and a Python API, which was mostly based on the ggplot2 package well-known to data scientists who use R.
> R ggplot2 has extensive documentation and a multitude of examples and therefore is an excellent resource for those who want to learn the grammar of graphics.
> Note that the Python API being very similar yet is different in detail from R. Although we have not implemented the entire ggplot2 API in our Python package, we have added a few new features to our Python API.
[+] [-] theandycamps|5 years ago|reply
Findings:
I honestly don't get the hype around R's plotting capabilities. ggplot2 is nice, but far too magical to be easily understood by beginners.
Python offers more intuitive memory management than R (which is really more a statement on how bad R is at memory efficiency).
R blows Python out of the water when it comes to expressing concise and readable linear algebra/stats computation. Done right, base R looks like mathematical pseudo-code.
R's non-standard evaluation is an under-appreciated killer feature. I've seen complex and powerful DSLs implemented implemented in base R. Python has some support for this kind of thing, but nowhere near the level of R.
I tried the Tidyverse and found it to be too magical and unstable. In general, the R package ecosystem is weird. Look at Stack Overflow and you'll find people touting all sorts of different packages to do simple operations. Needing to download a million random packages from grad students of the internet is a recipe for disaster.
[+] [-] throwaway4007|5 years ago|reply
Is there any coincidence that there are literally tens of different (and concurrent) rendering libraries for Python while the R world more or less settled for ggplot2? Writing matplotlib is such a pain in the ass that people go to great lengths to actually avoid using matplotlib (while still not having the expressivity, features and ease of use of ggplot2).
Magic is bad when you're shipping production code that's shared among multiple people who have to then spend a lot of time assimilating the mental model of the magic. It's perfect when you just want to plot stuff and draw nice figures.
[+] [-] jwilber|5 years ago|reply
In particular, I don’t see how the tidyverse is at all ‘magical’ - take the two most popular tidy libs, dplyr and ggplot2. The api for dplyr is very explicit, intuitive, and based on long-standing precedence (sql). The api for ggplot2 is admittedly less intuitive, but is itself an implementation of widely known framework (Grammar of Graphics). If by magical you mean in their abstractions, those are about as far from magical as one can get.
The tidyverse does get iffy sometimes when you need to dig into the rlang/tidy eval area, but that’s all well-documented.
I haven’t had to use R in a production environment for about a year now, but I always enjoyed the concise tidy api.
Base R is a total mess and largely inconsistent, but I guess that’s what you get when statisticians from different uni’s patchwork a language in their spare time.
[+] [-] ryndbfsrw|5 years ago|reply
I haven't found that to be the case. Loading a 2gb CSV file when you have 8gb of ram is touch and go with pandas but with data.table it’s a breeze not to mention operations once loaded up are a fair-bit faster. The pythong version has recently been released and already provides a serious speed bump over pandas not to mention out the box memory-mapping for huge files. I recommend checking it out if you haven't heard of it https://h2oai.github.io/db-benchmark/
[+] [-] abecode|5 years ago|reply
[+] [-] clircle|5 years ago|reply
This is baloney. You don't need to rely on a million R packages. You really only need a good plotting library (the built-in one is fine, or use ggplot), a good data frame library (dplyr, data.table, or use build-in one), and a few extras to handle some weird rough edges (lubridate, forcats).
This takes you to 90% to your analysis goals.
Only novices are 'using a million random packages', which is probably the case for python as well.
[+] [-] saeranv|5 years ago|reply
Are you including Numpy as part of Python in this statement? I've found Numpy to be as mathematically expressive as R, although I'm not a R power-user.
[+] [-] melling|5 years ago|reply
Of course, I reserve the right to change my opinion after I build a few things. Dynamic typing... hmmm...
[+] [-] throwaway29103|5 years ago|reply
[+] [-] prionassembly|5 years ago|reply
[+] [-] neves|5 years ago|reply
[+] [-] crispyambulance|5 years ago|reply
I wish more projects would use that way of thinking but it seems that in the jupyter/julia/python world there's too many choices and they all attempt a "kitchen-sink" approach for visualization.
[+] [-] throwaway4007|5 years ago|reply
[+] [-] neves|5 years ago|reply
[+] [-] c06n|5 years ago|reply
[+] [-] sandGorgon|5 years ago|reply
here's the path to Altair scrapping 20k lines of code to make the api simpler - https://twitter.com/jakevdp/status/1006929128119926786
Altair is really good. Probably as good as ggplot2
[+] [-] tbenst|5 years ago|reply
[+] [-] iav|5 years ago|reply
[+] [-] gimboland|5 years ago|reply
[+] [-] neves|5 years ago|reply
[+] [-] proverbialbunny|5 years ago|reply
[+] [-] befeltingu|5 years ago|reply
[+] [-] antipaul|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] leemailll|5 years ago|reply
[+] [-] vlovich123|5 years ago|reply
> The Lets-Plot for Python library includes a native backend and a Python API, which was mostly based on the ggplot2 package well-known to data scientists who use R.
> R ggplot2 has extensive documentation and a multitude of examples and therefore is an excellent resource for those who want to learn the grammar of graphics.
> Note that the Python API being very similar yet is different in detail from R. Although we have not implemented the entire ggplot2 API in our Python package, we have added a few new features to our Python API.
[+] [-] Tarq0n|5 years ago|reply