HN is predisposed to hate R because everyone here is coming from a "real" programming context. Their concerns are generally valid, but they should keep in mind a lot of people using do not have a software development background and do not care that the language is not elegantly designed: they just want to get analytical work done. In that respect, R is far, far superior to Python. Even something as simple as installing a library is a conceptual leap for these people (why wouldn't the software just come with everything needed to work?). Have you ever tried explaining the various python package and environment management options to someone with a background in Excel/SQL? Just getting a basic environment set up can be days of frustrating effort (though Anaconda is getting better with this). Compared to R, where you install RStudio and are off to the races, with a helpful package installation GUI. Another great example: in R, data types are pretty fungible, everything is a vector, coercing things generally "just works". In pandas, it can be very confusing that you need to explicitly turn a 1x1 dataframe into a scalar value. Same thing with Python vs R datetimes.
I understand some of this stuff is actually seen as a positive for Python in some contexts (production usage) and I agree. Just pointing out the woke take is the languages are both good, but good at different things. If I need to run a quick analysis on a dataset, I'm grabbing R 9/10 times. If I'm building a production pipeline, I'm using Python 9/10 times. This is perfectly fine.
It's also worth noting that R becomes much more pleasurable with the Tidyverse libraries. The pipe alone makes everything more readable.
I'm also coming from more of an office setting where everything is in Excel. I've used R to reorganize and tidy up Excel files a lot. Ggplot2 (part of the Tidyverse) is also fantastic for plotting, the grammar of graphics makes it really easy to make nice and slightly complex graphs. Compared to my Matplotlib experiences, it's night and day. Though I'd expect my experience with programming to be quite different from others' though, mainly because any code I write is basically an intermediary step before the output goes back in Excel.
That said, if anyone's interested in learning R from a beginner's level, I can recommend the book R for Data Science. It's available freely at http://r4ds.had.co.nz/ and the author also wrote ggplot2, RStudio, and several of the other Tidyverse libraries.
EDIT: I'm also currently writing my master's thesis in RMarkdown with the Thesisdown package. It's wonderful, it allows for using Latex without really knowing Latex which is great for us in business school.
No, you are wrong. R is terrible, and especially so for non-professional programmers, and it is an absolute disaster for the applications where it routinely gets used, namely statistics for scientific applications. The reason is its strong tendency to fail silently (and, with RStudio, to frequently keep going even when it does fail.) As a result, people get garbage results without realizing, and if they're unlucky, these results are similar enough to real results that they get put somewhere important. Source: I'm a CS grad working with biologists; I've corrected errors in the R code of PhD'd statisticians, in "serious" contexts.
Scientific applications require things to fail hard and often, to aggressively fail whenever anything is potentially behaving incorrectly. R does the exact opposite of that in several different, pernicious ways. IMHO, Python is more dangerous than a scientific computing language should be, but at least it will stop when it hits an error. R has undoubtedly cost humanity millions of dollars in wasted research costs and caused untold confusion, from otherwise perfectly-performed studies reporting corrupted statistical results. The world would be a noticeably better place without it.
I simply cannot articulate my opinion about R without sounding grossly hyperbolic. I'm sad that HN, a place which is typically enlightened in the ways of the programming arts, is so confused what this article is on about. If we tolerate such blatantly hostile design in something as important as the language of scientific statistics, where do we expect to get?
The thing to keep in mind is that, from the point of view of someone who works with data, R isn't a programming language. It's a statistical software package that has a programming language. Its competitors are things like Minitab, SPSS, Stata, and JMP, all of which used to be entirely menu-driven. R was a genuine innovation when it was first introduced.
Now it's certainly showing its age and the limits of its design, but it's still best in class for a certain kind of user. We could do better for software development, but it's not clear that doing so would actually make data analysis easier.
> Their concerns are generally valid, but they should keep in mind a lot of people using do not have a software development background and do not care that the language is not elegantly designed
To me, the opposite is true. People with no CS background would benefit the most from a simple design.
> in R, data types are pretty fungible, everything is a vector, coercing things generally "just works".
Things just work until they don't, and then you need to understand all the weirdness of R.
I don't know what's the typical experience of a non-programmer with R, but as a programmer, I had some headache trying to understand R semantics (apparently I'm not the only one [1]).
I use R at least a couple times a week. It gets the job done and I will be forever grateful for the tidyverse.
That said, R can be goddamn frustrating at times because of the way the documentation is written. It would be nice to simply be able to query about a function and get a cogent help file that explains THE BASICS of how to use the function for the most common use-case(s). Instead, the help files try to be "canonical" and front-load a bunch useless technical detail-- like that something is an "S3" object. Still haven't figured out what that really means and, I expect, that knowing something is "S3" will NEVER help me out when I am in a jam and need a little help to do something simple because I forgot some data manipulation detail.
Instead, I end up googling all the time connecting the dots all over the internet to get very simple stuffs done. At least now we have stackoverflow which, as vicious as it is, seems like Mister Rogers' Neighborhood compared to the old R mailing list.
Yes a couple of years ago I did this project where I needed to get a ton of analysis done, typically with plots and tables as output. Did this in python and it was this huge mess with pandas and matplotlib. Re-did it in R with data.table and ggplot2 and it was just ridiculously easier, and I could expand upon the code much more easily, plus the output was much prettier.
> Even something as simple as installing a library is a conceptual leap for these people (why wouldn't the software just come with everything needed to work?).
> Have you ever tried explaining the various python package and environment management options to someone with a background in Excel/SQL?
I don't understand the difficulty I've often seen voiced against this. Why would a newbie or someone who just wants to get analytical work done need anything beyond installing Python and doing `pip install library`? It's certainly orders of magnitude easier and faster than, say, using a C library. The only trouble I can see a newbie running into is if they want to install a library which doesn't have precompiled wheels and they need some dependencies to build it, but that's rarely an issue for popular packages.
> install RStudio and are off to the races, with a helpful package installation GUI.
Unless the package needs a native component like libcurl of particular version then it can turn into couple of hours of blindly trying everything you can think of.
> Another great example: in R, data types are pretty fungible, everything is a vector,
Unless it's a dataframe or factor or string or s3, s4 or s5 or a couple of other things.
And the documentation will tell you the reference paper that you can read and some completely impractical example.
> Their concerns are generally valid, but they should keep in mind a lot of people using do not have a software development background and do not care that the language is not elegantly designed: they just want to get analytical work done.
This implies we strive for good design in languages just because it appeases some ideal we have about how languages should be. But really we strive for good design in languages because it makes them more powerful, more expressive, easier to use, etc. Sure, maybe Python doesn't have all the right abstractions to be perfectly suited to statistical tasks, whereas R has more natural abstractions for that kind of stuff. But that doesn't mean that R doesn't also have many objectively bad design decisions even for statistical uses.
As a datapoint in agreement.
Some of the Biologists I work with love R. I had one tell me that its like how they think, and they weren't a python fan.
I think R-Studio (an R based IDE that turns it kinda into a more excel like experience) where you can inspect the data in memory (including matrix data) and graph making is where it really helps bring people into the R language. And with a set of instructions anyone can go load the analysis packages and do their data analysis.
Compare this to python, where they have to go the unix shell set up the environment, load the libraries. When they come back reset everything and get back to where they started.
Another feature for this audience is the philosophy that functions shouldn't have side effects. You can still do (several types) of object oriented programming in R, but it does take away some of the ways in which non-programmers shoot themselves in the foot.
I've come to really like the way environments work in R, as well.
Good point about the package management, but I disagree with your argument. Non-Computer Scientists seem to have a much easier time with Python than R, anecdotally.
I think the reason is that R is not just a badly designed language, but in particular its design is inconsistent. That’s as confusing to newcomers as it is to people who care about PL design.
I used R for almost a decade. Last year I switched to Python and Jupyter, never looked back. Can’t recommend the switch highly enough. R has great stats packages, but struggling with the language is just not worth it.
I came from a "real" programming background. This was pre-Tidyverse. Learning R thoroughly was the best thing I could have ever done for my proglang understanding as all the weird things R had meant that every time I learned something new I could say "Oh, it's like X in R".
HN is predisposed to hate R because everyone here is coming from a "real" programming context.
There is that. Matlab has the same problem.
One problem with "real programming languages" is that programmers who grew up with C don't see any need for built-in multidimensional arrays. This is one reason FORTRAN is still around, and why array work is straightforward in Matlab and R.
Anaconda is becoming to python what Chrome is to browsers, particularly as Jupyter matures. Drop it in, and a huge amount of what you want to do is ready to go. Sure, there's lots of libraries/extensions available, but most of the time you can do real work with a default userland, non-privileged install.
And there is litterally no equivalent to dplyr and ggplot2 in Python. Those alone can make a huge difference in how many lines you need to write to do something.
I actually totally agree with this. I learned how to program in R and found it to be quite wonderful to use as a noob. As you say, shit just works. If you think you should be able to do an operation, you typically can. To this day I still prefer cleaning and doing proof of concept analyses in R rather than Python. It's so much easier than having to fuck with pandas and numpy.
One common thing across most the "real" programming languages makes them unfit for data work: 0-based indexing.
It is just ridiculous to call the first row in a data set as 0th row, and the last row as (n-1)th row. It does not make any sense for data analytic work.
As a long-time R user, I agree with all of these complaints. The language itself is ugly and actively tries to get in your way.
I'll add that concepts like data frames are not really intrinsic, and you get needless complexities like "length", "nrow", "dim", each of which does the wrong thing in 90% of the scenarios of interest. The confusion of lvalues is another strange quirk -- a <- 0; length(a) <- 20 is totally valid, and you get things like class(a) <- 'foo' being preferred over the equivalent a$class <- foo. It has all sorts of odd concepts between lists and data.frames -- the double-bracket syntax, etc. The object model is very confusing, though most people seem to have converged on the S3 system, which is the oldest one.
If you discipline yourself to learning "the good parts", especially by learning either data.tables or tidyverse or becoming a master of split/lapply/aggregate/ave, then it is very powerful. The modelling tools and plotting (both base graphics and ggplot2) are excellent.
I'd love to see a NeoR arise at some point that fixes the strange historical inconsistencies (like what happens when you refer to vec[0], as noted by the author) in non-backward compatible ways.
This is a stupendous example of someone going overboard on their criticisms in order to grandstand.
R may not be the most "beautiful" language in a general perspective, but it certainly is more beautiful than Python when it comes to actual data analysis. There is nothing in R that is as ugly as even the best implemented pandas, numpy, and matplotlib code. All of the options in Python, which is generally pointed to as the "superior" language to R, feel tacked on and hackish.
The real story behind most of the complaints is that they come from software developers who only rarely need to do data analysis that would require R, and therefore use it infrequently and mistake their unfamiliarity with the language with the language being bad.
I also groaned at the part where the author struggled to google questions about R because of its "stupid name". I have literally never, ever had issues Googling anything about R they same way I haven't ever had issues finding answers to my questions about "Python". The author is grasping at straws, and this is a programming blog's equivalent of clickbait.
I think many of the gotchas and annoying parts of base R are solved by using tools from the tidyverse: http://github.com/tidyverse. For example, the pain of needing to specify `stringsAsFactors=FALSE` is solved in the tibble package by setting a sensible default.
At any rate, at least it's not Pandas and matplotlib...
I learned R coming from Java, Node, PHP and Python and I love it !!! It is awful as an application development programming language, but it was never designed for that purpose. It was designed for STATISTICS. Try to achieve advanced statistics with your traditional software engineer's preferred language and see which language you hate then. The only tricky R concepts to learn for newbies are: recycling, formulas and vectorized functions. Add RevoScaleR to R and it kicks major ass when dealing with big data manipulation. Oh yes, big time !!!
I use R a lot and I have to say some of these comments are weird.
1. R and Lisp are hardly alike even if it was inspire by it. It's like saying Erlang and Prolog is very similar. If you want learn FP do it in Erlang, Lisp, Haskell, etc.. Don't do it in R, it's half baked.
2. R syntax is ugly with warts. But built in datatype like dataframe, factor type, NA (missing value notion) value, make this language much better than many languages out there for dealing with data. Subsetting dataframe is a breeze even in base R.
3. There are many many advance statistical packages only in R. GLMnet was was in R for 4-5 years before someone decided to port it to Python. You can argue that there might be alternative package. But the statistician that created ridge, elastic, etc... method made GLMnet. There are many statistician out there that just implement their latest method in R. If you want to learn a subject in statistic there is probably a book out there and it'll have an R package and code to come along with it. Next to that will be SAS. There are very few stat book with python packages. You want to learn bayesian statistic? Social Network Analysis? There's a book for it with R code and a package to do that. Good luck finding one in Python for these subfield of statistic. There's a bayesian hierarchical analysis in Ecology and that book is in R.
4. ggplot2 is amazing for static graphic. R doesn't have good dynamic graphic out there and I kinda meh with Shiney. If you hate the syntax then you may learn to appreciate it by reading it from the creator https://www.r-bloggers.com/a-simple-introduction-to-the-grap...
>A R factor is a sequence type much like a character atomic vector except that the values of the factor are constrained to a set of string values, called “levels”. For example, if you have a table of measurements of some widgets and each row corresponds to a single measurement of a single widget, you could have a factor-typed column called measurement.type containing the values “length”, “width”, “height”, “weight”, and “hue”, with the corresponding numeric measurements stored in a “value” column.
This is a very bad example of what factors are for in R, because it makes it seem like factors are for defining variables or keys in key value pairs. You can use them for that, but it isn't the intended use. A better example would be:
suppose you were comparing the amount of sugar in fruits based on several growing locations, and you had three columns:
| Fruit | Location | Density (g/L) |
Fruit would be a factor variable (let's say it takes the possibilities of apple, banana, orange), and location could be too, if it were a discrete set of possibilities (as opposed to lat/lon coords)
This author seems to forget that R was built for working with data in an analytical setting, unlike all of the languages he's comparing it to. It has creeped into other areas, but that seems to be because in the hands of a skilled user it is far easier to implement a data analysis solution. I'm sure someone will come in and say how much better pandas is, but on the small datasets, I'll stick with R, especially with how brittle and buggy matplotlib is.
I don't understand why HN hates R. HN loves lisp, and R as a language shares a much greater affinity with lisp languages than python or Go do. The language was born out of the original authors reading SICP (as statisticians). Sure, many of the users of R molded it to look like what they were used to (S), but that just highlights the powerful metaprogramming capabilities of the language.
> HN hates R. HN loves lisp, and R as a language shares a much greater affinity with lisp languages than python or Go do
HN was started by someone who wrote a book on, and flipped a startup using, Lisp. Python and Go are both used a lot by a buyer of startups, and HN exists to deliver startups, or their IP, to those buyers. R is more of a language for helping its users do data analysis, perhaps in corporate offices, and hence doesn't have as much use for HN's business purpose. Submissions on Python and Go are more likely to stick to HN's front page.
I use lisp and R. While R evolved from lisp, which let's me understand how it does certain things, I don't know if I'd describe it as more like lisp than python.
Indeed, the analogy I use to describe to friends why I have a strong emotional distaste for R is to use the following analogy:
Imagine you grew up as a heterosexual male. In your early years, you have fond memories of a young girl whom you had a fling with.
She drops off your radar, and you run into her 30 years later. She's gotten breast implants, botched her face with plastic surgery, and went through a rather traumatic divorce and reinvention of herself.
To your friends who lived on an island where there were no women and kept in basements and were regularly beaten by other stats programs, she might even be beautiful, and she certainly pays them attention to their base desires that they crave.
To you, she'll always be a mangled shadow of her former self and what could have been...
I won't speak for HN, but here's a tongue in cheek summary of my experience with R.
I write a script. It doesn't work. I don't know why. I look at the error message, and then google for 30 minutes to understand what it really means - which parts of the code broke, why, how to fix them. Because none of the 3 things (which, why, how) is easy to get to.
OK, I fix it, having learned something new (like that there are infinite special cases with almost any functions).
I commit it to repo, go for coffee. In the afternoon, a colleague asks how to run that code. Well, it was a simple script, half a page, what's the problem?
I take a look, and on their machine it doesn't run. We don't know why. An hour later we discover she has some R profile file with a setting that changes behavior of some standard library... and she also has different encoding set as default, and so on, and so forth... whatever. I don't know why runtime environment encoding changes behavior of code that only deals with numbers, but hey! It's interesting at least.
We fix it, we are happy.
A few days later I run the script again. It works. The result doesn't look right though. It's mostly zeroes. Hmm.
I run it a few more times, playing around with input, trying to figure out what's up.
OK, after a few minutes I realize there's lots of red color that flashes on running the script on my screen - just so fast I barely see it.
It turns out half the code isn't really running, the script just ignores it though (errors do NOT stop the code from running), and keeps going. It produces partial output happily announcing it finished.
That is the most serious mindfuck. Everything is OK, says the prompt, here's your 1 megabyte result of the calculation, oh, just don't look at the numbers, because I havent' really run any of the code... I couldn't find one of the functions.
I sit there wondering. Which is worse: the fact that every time I try launching the script something else is happening, or the fact that the runtime environment by default will return garbage with NO warning at the end (which is the only thing you see on screen) but with a million warnings in between (which you won't see unless you have really good reflexes...).
Which is worse?
I decided at some point, that I want a language to fail, and to always give me the same result. An error, an exception, this should kill the program and shout as loud as possible "Won't give you anything". Also I want code that ran yesterday to run today, and to run on my colleague's machine, and on a newer version of R. This was never our experience.
R is much more lisp-like than python: more functional, more emphasis on DSLs and metaprogramming, and the community is far more inviting to New comers: reminds me a lot of Racket in that regards.
Yes! People ask me what the best R programming book I’ve ever read is, and I like to say “How to Design Programs.”
I keep thinking someday I’ll try and build some of the missing stats infrastructure in Racket. The problem at present is time and that this stuff is real work!
Having used Python, JSL, Julia, R and Matlab; I agree with most of the things in R. R is an extremely ugly language. It seems to be created by people who wear capris and uggs (both at the same time). But, R has incredible packages, especially the work done by Hadley Wickam. ggplot2 is beautiful. It is utterly gorgeous. It is what Ted Baker is to the capri guys that designed the language itself.
My personal hack to deal with the unbearable ugliness of R is to use Rpy2 and call R packages from Python --- at least writing some boilerplate code in Python makes me happier than having to write in R.
ggplot2 produces beautiful graphs. I don't think it's beautiful as a package -- the syntax is strange and reflects an earlier evolution of the ideas that went into the tidyverse.
Notably the use of + instead of chaining operators, the use of a custom "ggproto" object system instead of S3 (which makes extensibility a nightmare), and the superfluous presence of the aes() function (rendered unnecessary by better lazy evaluation tricks not really well-explored at the time).
> There are subtle differences and some authorities prefer <- for assignment but I’m increasingly convinced that they are wrong; = is always safer. R doesn’t allow expressions of the form if (a = testSomething()) but does allow if (a <- testSomething()).
...I am confused here. If you're testing for equality, R requires you to use == and not =. If you try to test for equality with =, it throws an error instead of treating it as an assignment. That's good. But who is trying to test for equality with <-?
If you have tried R and found it painful I can’t say enough good things about the “R for Data Science” book. Great overview of Tidyverse, ggplot. After 50 pages I was further along than my previous 5 years of googling and cursing.
There are a bunch of odd non-standard syntax choices in this tutorial. For example, the author ends statements with semicolons. R does allow equal sign assignment (although style guides prefer the stupid arrow syntax). The author mentions Bioconductorm the... second biggest package repository for the language?
I clicked because I was a programmer for 15 years before I used R, and I have subsequently developed and shipped R packages, so I feel like I'm in a pretty good position to get the visceral, cathartic, "argh" the writer here was going for.
the "stupid arrow syntax" is not nearly so stupid, and I speak here as a 10+ year Python developer, as the misuse of the equal sign that every C-style language seems to think is ok. The = sign meant something long before computer programming, and it is something the developer community ought to be ashamed of that it is being used for "change this thing to that". /rant
The second time, I had a purpose and found enough code to copy to achieve it. I was rewarded.
The third time, I had a more complex problem demanding use of JSON from Elastic Search, and found that the two packages out there in git are basically orphanware, use dplyr in extremely confusing ways, and offer little or no advantage to simplistic HTTP fetching and direct to JSON parsing. Which is a huge shame, because the idea of an elastic search abstraction is very attractive. But. "it just didn't work out of the box"
I am very clear I am an R "consumer" not an R developer. But, at this point, absent Shiny and a gui, I think that Python and Numpy has as much to offer me basically.
Some people say the syntax is FP friendly. I have been trying to learn FP in Haskell and I think R is about the worst notation you could invent to sell FP.
Great guide. R syntax is awful, no matter how you slice it and would have been the tool of choice before Python could walk. Still, it is very powerful and great to have around.
I use R most of the time and I find R notebooks very data exploration friendly. It makes it easy to back and forth just like Jupyter notebook. Producing HTML files from Rmarkdown files is also analysis friendly.
99% of the time I use tidyverse with no noticeable impact on the performance. For that occasional 1%, I must admit datatable package works out really well. tidyverse pipes are so unixy that makes it easy to transition to command such as cut, head, sort and column if needed without any mental contortion.
I have used Python occasionally and with method chaining, it can almost simulate the "dplyr" like syntax. However, it is hard to find some obscure statistical test out of the box which is easy in R.
There are some highly productive researchers who have taken the time to write well constructed and very useable packages in R, married with meticulous documentation. The same individuals or groups of 1-3 people have also maintained said packages for 5 years or more, and regularly respond in person to queries. Two examples are ggplot2 and limma.
To me, this is all the encouragement I need to use R.
Some reading this will claim any criticism is the fault of the critic. Others will jump in claiming it is all revealing the emperor's clothing and promoting an alternate religion.
A few will deconstruct the criticisms and look for the small documentation or even language changes to solve something.
Hmmmmm, okay after 20 minutes of squinting at the script I see there's a lowercase letter at the header name of the last column somewhere in the middle of the 200 lines...now we can move on to debugging the next "Error"...
[+] [-] extr|7 years ago|reply
I understand some of this stuff is actually seen as a positive for Python in some contexts (production usage) and I agree. Just pointing out the woke take is the languages are both good, but good at different things. If I need to run a quick analysis on a dataset, I'm grabbing R 9/10 times. If I'm building a production pipeline, I'm using Python 9/10 times. This is perfectly fine.
[+] [-] bokstavkjeks|7 years ago|reply
I'm also coming from more of an office setting where everything is in Excel. I've used R to reorganize and tidy up Excel files a lot. Ggplot2 (part of the Tidyverse) is also fantastic for plotting, the grammar of graphics makes it really easy to make nice and slightly complex graphs. Compared to my Matplotlib experiences, it's night and day. Though I'd expect my experience with programming to be quite different from others' though, mainly because any code I write is basically an intermediary step before the output goes back in Excel.
That said, if anyone's interested in learning R from a beginner's level, I can recommend the book R for Data Science. It's available freely at http://r4ds.had.co.nz/ and the author also wrote ggplot2, RStudio, and several of the other Tidyverse libraries.
EDIT: I'm also currently writing my master's thesis in RMarkdown with the Thesisdown package. It's wonderful, it allows for using Latex without really knowing Latex which is great for us in business school.
[+] [-] maxander|7 years ago|reply
Scientific applications require things to fail hard and often, to aggressively fail whenever anything is potentially behaving incorrectly. R does the exact opposite of that in several different, pernicious ways. IMHO, Python is more dangerous than a scientific computing language should be, but at least it will stop when it hits an error. R has undoubtedly cost humanity millions of dollars in wasted research costs and caused untold confusion, from otherwise perfectly-performed studies reporting corrupted statistical results. The world would be a noticeably better place without it.
I simply cannot articulate my opinion about R without sounding grossly hyperbolic. I'm sad that HN, a place which is typically enlightened in the ways of the programming arts, is so confused what this article is on about. If we tolerate such blatantly hostile design in something as important as the language of scientific statistics, where do we expect to get?
[+] [-] joker3|7 years ago|reply
Now it's certainly showing its age and the limits of its design, but it's still best in class for a certain kind of user. We could do better for software development, but it's not clear that doing so would actually make data analysis easier.
[+] [-] yodsanklai|7 years ago|reply
To me, the opposite is true. People with no CS background would benefit the most from a simple design.
> in R, data types are pretty fungible, everything is a vector, coercing things generally "just works".
Things just work until they don't, and then you need to understand all the weirdness of R.
I don't know what's the typical experience of a non-programmer with R, but as a programmer, I had some headache trying to understand R semantics (apparently I'm not the only one [1]).
[1] http://r.cs.purdue.edu/pub/ecoop12.pdf
[+] [-] crispyambulance|7 years ago|reply
That said, R can be goddamn frustrating at times because of the way the documentation is written. It would be nice to simply be able to query about a function and get a cogent help file that explains THE BASICS of how to use the function for the most common use-case(s). Instead, the help files try to be "canonical" and front-load a bunch useless technical detail-- like that something is an "S3" object. Still haven't figured out what that really means and, I expect, that knowing something is "S3" will NEVER help me out when I am in a jam and need a little help to do something simple because I forgot some data manipulation detail.
Instead, I end up googling all the time connecting the dots all over the internet to get very simple stuffs done. At least now we have stackoverflow which, as vicious as it is, seems like Mister Rogers' Neighborhood compared to the old R mailing list.
[+] [-] newen|7 years ago|reply
[+] [-] Avshalom|7 years ago|reply
Yes, well, from the outside it doesn't seem like Python programmers have any better grasp of those either.
[+] [-] Giroflex|7 years ago|reply
I don't understand the difficulty I've often seen voiced against this. Why would a newbie or someone who just wants to get analytical work done need anything beyond installing Python and doing `pip install library`? It's certainly orders of magnitude easier and faster than, say, using a C library. The only trouble I can see a newbie running into is if they want to install a library which doesn't have precompiled wheels and they need some dependencies to build it, but that's rarely an issue for popular packages.
[+] [-] yread|7 years ago|reply
Unless the package needs a native component like libcurl of particular version then it can turn into couple of hours of blindly trying everything you can think of.
> Another great example: in R, data types are pretty fungible, everything is a vector,
Unless it's a dataframe or factor or string or s3, s4 or s5 or a couple of other things.
And the documentation will tell you the reference paper that you can read and some completely impractical example.
Ugh, feels better now, sorry for the rant.
[+] [-] shawnz|7 years ago|reply
This implies we strive for good design in languages just because it appeases some ideal we have about how languages should be. But really we strive for good design in languages because it makes them more powerful, more expressive, easier to use, etc. Sure, maybe Python doesn't have all the right abstractions to be perfectly suited to statistical tasks, whereas R has more natural abstractions for that kind of stuff. But that doesn't mean that R doesn't also have many objectively bad design decisions even for statistical uses.
[+] [-] acomjean|7 years ago|reply
I think R-Studio (an R based IDE that turns it kinda into a more excel like experience) where you can inspect the data in memory (including matrix data) and graph making is where it really helps bring people into the R language. And with a set of instructions anyone can go load the analysis packages and do their data analysis.
Compare this to python, where they have to go the unix shell set up the environment, load the libraries. When they come back reset everything and get back to where they started.
[+] [-] gbrown|7 years ago|reply
I've come to really like the way environments work in R, as well.
[+] [-] randomsearch|7 years ago|reply
I think the reason is that R is not just a badly designed language, but in particular its design is inconsistent. That’s as confusing to newcomers as it is to people who care about PL design.
I used R for almost a decade. Last year I switched to Python and Jupyter, never looked back. Can’t recommend the switch highly enough. R has great stats packages, but struggling with the language is just not worth it.
[+] [-] jghn|7 years ago|reply
[+] [-] Animats|7 years ago|reply
There is that. Matlab has the same problem.
One problem with "real programming languages" is that programmers who grew up with C don't see any need for built-in multidimensional arrays. This is one reason FORTRAN is still around, and why array work is straightforward in Matlab and R.
[+] [-] killjoywashere|7 years ago|reply
Anaconda is becoming to python what Chrome is to browsers, particularly as Jupyter matures. Drop it in, and a huge amount of what you want to do is ready to go. Sure, there's lots of libraries/extensions available, but most of the time you can do real work with a default userland, non-privileged install.
[+] [-] ekianjo|7 years ago|reply
[+] [-] Thriptic|7 years ago|reply
[+] [-] educationdata|7 years ago|reply
It is just ridiculous to call the first row in a data set as 0th row, and the last row as (n-1)th row. It does not make any sense for data analytic work.
[+] [-] andrewla|7 years ago|reply
I'll add that concepts like data frames are not really intrinsic, and you get needless complexities like "length", "nrow", "dim", each of which does the wrong thing in 90% of the scenarios of interest. The confusion of lvalues is another strange quirk -- a <- 0; length(a) <- 20 is totally valid, and you get things like class(a) <- 'foo' being preferred over the equivalent a$class <- foo. It has all sorts of odd concepts between lists and data.frames -- the double-bracket syntax, etc. The object model is very confusing, though most people seem to have converged on the S3 system, which is the oldest one.
If you discipline yourself to learning "the good parts", especially by learning either data.tables or tidyverse or becoming a master of split/lapply/aggregate/ave, then it is very powerful. The modelling tools and plotting (both base graphics and ggplot2) are excellent.
I'd love to see a NeoR arise at some point that fixes the strange historical inconsistencies (like what happens when you refer to vec[0], as noted by the author) in non-backward compatible ways.
[+] [-] thousandautumns|7 years ago|reply
R may not be the most "beautiful" language in a general perspective, but it certainly is more beautiful than Python when it comes to actual data analysis. There is nothing in R that is as ugly as even the best implemented pandas, numpy, and matplotlib code. All of the options in Python, which is generally pointed to as the "superior" language to R, feel tacked on and hackish.
The real story behind most of the complaints is that they come from software developers who only rarely need to do data analysis that would require R, and therefore use it infrequently and mistake their unfamiliarity with the language with the language being bad.
I also groaned at the part where the author struggled to google questions about R because of its "stupid name". I have literally never, ever had issues Googling anything about R they same way I haven't ever had issues finding answers to my questions about "Python". The author is grasping at straws, and this is a programming blog's equivalent of clickbait.
[+] [-] huac|7 years ago|reply
At any rate, at least it's not Pandas and matplotlib...
[+] [-] JVerstry|7 years ago|reply
[+] [-] digitalzombie|7 years ago|reply
1. R and Lisp are hardly alike even if it was inspire by it. It's like saying Erlang and Prolog is very similar. If you want learn FP do it in Erlang, Lisp, Haskell, etc.. Don't do it in R, it's half baked.
2. R syntax is ugly with warts. But built in datatype like dataframe, factor type, NA (missing value notion) value, make this language much better than many languages out there for dealing with data. Subsetting dataframe is a breeze even in base R.
3. There are many many advance statistical packages only in R. GLMnet was was in R for 4-5 years before someone decided to port it to Python. You can argue that there might be alternative package. But the statistician that created ridge, elastic, etc... method made GLMnet. There are many statistician out there that just implement their latest method in R. If you want to learn a subject in statistic there is probably a book out there and it'll have an R package and code to come along with it. Next to that will be SAS. There are very few stat book with python packages. You want to learn bayesian statistic? Social Network Analysis? There's a book for it with R code and a package to do that. Good luck finding one in Python for these subfield of statistic. There's a bayesian hierarchical analysis in Ecology and that book is in R.
4. ggplot2 is amazing for static graphic. R doesn't have good dynamic graphic out there and I kinda meh with Shiney. If you hate the syntax then you may learn to appreciate it by reading it from the creator https://www.r-bloggers.com/a-simple-introduction-to-the-grap...
[+] [-] hendiatris|7 years ago|reply
This is a very bad example of what factors are for in R, because it makes it seem like factors are for defining variables or keys in key value pairs. You can use them for that, but it isn't the intended use. A better example would be:
suppose you were comparing the amount of sugar in fruits based on several growing locations, and you had three columns:
| Fruit | Location | Density (g/L) |
Fruit would be a factor variable (let's say it takes the possibilities of apple, banana, orange), and location could be too, if it were a discrete set of possibilities (as opposed to lat/lon coords)
This author seems to forget that R was built for working with data in an analytical setting, unlike all of the languages he's comparing it to. It has creeped into other areas, but that seems to be because in the hands of a skilled user it is far easier to implement a data analysis solution. I'm sure someone will come in and say how much better pandas is, but on the small datasets, I'll stick with R, especially with how brittle and buggy matplotlib is.
[+] [-] minimaxir|7 years ago|reply
That is the approach for tidy data, which is used a lot in the R tidyverse (http://tidyr.tidyverse.org/articles/tidy-data.html)
[+] [-] wdkrnls|7 years ago|reply
[+] [-] vorg|7 years ago|reply
HN was started by someone who wrote a book on, and flipped a startup using, Lisp. Python and Go are both used a lot by a buyer of startups, and HN exists to deliver startups, or their IP, to those buyers. R is more of a language for helping its users do data analysis, perhaps in corporate offices, and hence doesn't have as much use for HN's business purpose. Submissions on Python and Go are more likely to stick to HN's front page.
[+] [-] ACow_Adonis|7 years ago|reply
I use lisp and R. While R evolved from lisp, which let's me understand how it does certain things, I don't know if I'd describe it as more like lisp than python.
Indeed, the analogy I use to describe to friends why I have a strong emotional distaste for R is to use the following analogy:
Imagine you grew up as a heterosexual male. In your early years, you have fond memories of a young girl whom you had a fling with.
She drops off your radar, and you run into her 30 years later. She's gotten breast implants, botched her face with plastic surgery, and went through a rather traumatic divorce and reinvention of herself.
To your friends who lived on an island where there were no women and kept in basements and were regularly beaten by other stats programs, she might even be beautiful, and she certainly pays them attention to their base desires that they crave.
To you, she'll always be a mangled shadow of her former self and what could have been...
[+] [-] jakubp|7 years ago|reply
I write a script. It doesn't work. I don't know why. I look at the error message, and then google for 30 minutes to understand what it really means - which parts of the code broke, why, how to fix them. Because none of the 3 things (which, why, how) is easy to get to.
OK, I fix it, having learned something new (like that there are infinite special cases with almost any functions).
I commit it to repo, go for coffee. In the afternoon, a colleague asks how to run that code. Well, it was a simple script, half a page, what's the problem?
I take a look, and on their machine it doesn't run. We don't know why. An hour later we discover she has some R profile file with a setting that changes behavior of some standard library... and she also has different encoding set as default, and so on, and so forth... whatever. I don't know why runtime environment encoding changes behavior of code that only deals with numbers, but hey! It's interesting at least. We fix it, we are happy.
A few days later I run the script again. It works. The result doesn't look right though. It's mostly zeroes. Hmm.
I run it a few more times, playing around with input, trying to figure out what's up.
OK, after a few minutes I realize there's lots of red color that flashes on running the script on my screen - just so fast I barely see it.
It turns out half the code isn't really running, the script just ignores it though (errors do NOT stop the code from running), and keeps going. It produces partial output happily announcing it finished.
That is the most serious mindfuck. Everything is OK, says the prompt, here's your 1 megabyte result of the calculation, oh, just don't look at the numbers, because I havent' really run any of the code... I couldn't find one of the functions.
I sit there wondering. Which is worse: the fact that every time I try launching the script something else is happening, or the fact that the runtime environment by default will return garbage with NO warning at the end (which is the only thing you see on screen) but with a million warnings in between (which you won't see unless you have really good reflexes...).
Which is worse?
I decided at some point, that I want a language to fail, and to always give me the same result. An error, an exception, this should kill the program and shout as loud as possible "Won't give you anything". Also I want code that ran yesterday to run today, and to run on my colleague's machine, and on a newer version of R. This was never our experience.
[+] [-] wdkrnls|7 years ago|reply
[+] [-] peatmoss|7 years ago|reply
I keep thinking someday I’ll try and build some of the missing stats infrastructure in Racket. The problem at present is time and that this stuff is real work!
[+] [-] fermienrico|7 years ago|reply
[+] [-] geoalchimista|7 years ago|reply
[+] [-] andrewla|7 years ago|reply
Notably the use of + instead of chaining operators, the use of a custom "ggproto" object system instead of S3 (which makes extensibility a nightmare), and the superfluous presence of the aes() function (rendered unnecessary by better lazy evaluation tricks not really well-explored at the time).
[+] [-] cwyers|7 years ago|reply
...I am confused here. If you're testing for equality, R requires you to use == and not =. If you try to test for equality with =, it throws an error instead of treating it as an assignment. That's good. But who is trying to test for equality with <-?
[+] [-] bo1024|7 years ago|reply
[+] [-] zaphar|7 years ago|reply
[+] [-] jdwyah|7 years ago|reply
[+] [-] notafraudster|7 years ago|reply
I clicked because I was a programmer for 15 years before I used R, and I have subsequently developed and shipped R packages, so I feel like I'm in a pretty good position to get the visceral, cathartic, "argh" the writer here was going for.
[+] [-] rossdavidh|7 years ago|reply
[+] [-] ggm|7 years ago|reply
The second time, I had a purpose and found enough code to copy to achieve it. I was rewarded.
The third time, I had a more complex problem demanding use of JSON from Elastic Search, and found that the two packages out there in git are basically orphanware, use dplyr in extremely confusing ways, and offer little or no advantage to simplistic HTTP fetching and direct to JSON parsing. Which is a huge shame, because the idea of an elastic search abstraction is very attractive. But. "it just didn't work out of the box"
I am very clear I am an R "consumer" not an R developer. But, at this point, absent Shiny and a gui, I think that Python and Numpy has as much to offer me basically.
Some people say the syntax is FP friendly. I have been trying to learn FP in Haskell and I think R is about the worst notation you could invent to sell FP.
[+] [-] digitalzombie|7 years ago|reply
R is such a bad language to learn FP in.
[+] [-] floki999|7 years ago|reply
[+] [-] kasperset|7 years ago|reply
99% of the time I use tidyverse with no noticeable impact on the performance. For that occasional 1%, I must admit datatable package works out really well. tidyverse pipes are so unixy that makes it easy to transition to command such as cut, head, sort and column if needed without any mental contortion.
I have used Python occasionally and with method chaining, it can almost simulate the "dplyr" like syntax. However, it is hard to find some obscure statistical test out of the box which is easy in R.
[+] [-] Gatsky|7 years ago|reply
To me, this is all the encouragement I need to use R.
[+] [-] CharlesMerriam2|7 years ago|reply
A few will deconstruct the criticisms and look for the small documentation or even language changes to solve something.
I applaud the final group.
[+] [-] dre85|7 years ago|reply
Hmmmmm, okay after 20 minutes of squinting at the script I see there's a lowercase letter at the header name of the last column somewhere in the middle of the 200 lines...now we can move on to debugging the next "Error"...
[+] [-] haihaibye|7 years ago|reply