One Year with R | WingNews

[+] tel|4 years ago|reply

R, and by R I mean R+tidyverse, is the world's best graphing calculator attached to an OK scheme.

To which I mean R is a highly optimized, well-oiled machine if you're using it for its highly-optimized, well-oiled purposes. I tend to have notebooks full of tiny fragments like this

    dat_min %>%
      group_by(ymd = make_date(year(date), month(date), day(date))) %>%
      summarize(vol_btc=sum(vol_btc), vol_usdt=sum(vol_usdt), tradecount=sum(tradecount)) %>%
      ungroup() %>%
      pivot_longer(cols=c(-ymd)) %>%
      ggplot(aes(ymd, value)) + 
      geom_line() +
      facet_grid(name ~ ., scales="free_y")

It's madness if you're not familiar with the tidyverse, but 3 dozen fragments like this is enough to eviscerate a fresh data set. Almost any question you can dream of is a 3-20 line set of transforms away from a beautiful plot or analysis answering your question. Very notably, this includes some of the finest modeling tools available today.

Terseness here is a huge advantage as well because in many data analysis workflows you are rerunning that same 10 line snippet over and over, making small changes, adjusting to eventually visualize the thing you're looking for perfectly. Having all of that in the same small block is ideal.

Finally, for the non-trivial number of folks in this specific scenario, the integration between Stan and R/RStudio is top-notch and makes using both tools very pleasant.

You can replicate all of this in Python, but optimal Python/Jupyter is still a far cry away from R/RStudio for these specific sorts of tasks.

[+] peatmoss|4 years ago|reply

> best graphing calculator attached to an OK scheme.

I discovered "How To Design Programs" somewhere late in my first year of using R. Like most beginning R coders with nominal experience in other languages, I wrote a lot of monolithic scripts in a very imperative style. HtDP gave me a mental framework for decomposing larger problems into bite-sized chunks. The lispy roots of R lent itself particularly well to the model of thinking presented in that book.

Ever since then, I've pined for the graphing calculator parts in a more modern Scheme. When ggplot and then the tidyverse (neé hadleyverse) came on the scene, I was even more convinced that Scheme, especially Racket, was the ideal future for data science. If R could support a large ecosystem like tidyverse, just imagine what the metaprogramming facilities of Racket could do!

But I think those graphing calculator parts are hard to reproduce. Attempts to clone ggplot2 fall short year after year, because most other languages don't have grid graphics to build on top of. R is a deep ecosystem on "an OK scheme," which is damned hard to beat.

Aside: my first year with R, was in an urban planning masters program and I was terrified of my first big kid statistics course (taught in SPSS). I decided I'd give myself bonus work by learning R. While it was absurd to be doing my stats homework in SPSS, then R, then reviewing HtDP on top of the rest of my course load, I did ace that stats course. :-)

[+] delusional|4 years ago|reply

> R is a highly optimized, well-oiled machine if you're using it for its highly-optimized, well-oiled purposes.

This hits home for me. We are just starting to use R for risk modeling where I work. R, more than any language I've ever used, makes me appreciate "worse is better". From a theoretical "aesthetic" perspective R is a mess. Yet for data processing all those theoretical concerns don't matter. It just works.

It's honestly kind of humbling that something so theoretically messy can be so practically coherent. It makes me question my assumptions about simplicity.

[+] dm319|4 years ago|reply

This is just a quick example - I would be grateful if people could recreate this brief look at UK COVID figures in another language:

  library(tidyverse)
  library(scales)
  
  download.file(url = "https://api.coronavirus.data.gov.uk/v2/data?areaType=overview&metric=covidOccupiedMVBeds&metric=newAdmissions&metric=newCasesBySpecimenDate&metric=newDeaths28DaysByDeathDate&metric=newPeopleReceivingFirstDose&format=csv", destfile = "./data.csv", method = "wget")
  
  read_csv("./data.csv") %>%
  pivot_longer(names_to = "Data", cols = c(newCasesBySpecimenDate,
                       covidOccupiedMVBeds,
                       newAdmissions,
                       newDeaths28DaysByDeathDate)) %>%
  mutate(Data = factor(Data)) %>%
  mutate(Data = recode_factor(Data, newCasesBySpecimenDate = "New Cases",
         newAdmissions = "Admissions",
         newDeaths28DaysByDeathDate = "Deaths",
         covidOccupiedMVBeds = "Ventilated")) %>%
  ggplot(aes(y = value, x = date, colour = Data))+
  geom_point(size = 1, colour = "gray", alpha = 0.6)+
  geom_smooth(type = "LOESS", span = 0.1)+
  labs(y = "Daily rate", x = "Date", colour = "UK COVID-19")+
  scale_x_date(date_breaks = "months", date_labels = "%b-%y")+
  scale_y_log10(labels = comma(10 ^ (0:5),
                 accuracy = 1),
         breaks = 10 ^ (0:5))+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

[+] CapmCrackaWaka|4 years ago|reply

I also use R for any heavy data manipulation, but I primarily use the data.table package. The efficiency that both of these packages unlock is absolutely unparalleled in any other tabular data manipulation library, in any other language that I have used. And R has the top 2!!

My skin writhes every time I need to type:

table.loc[(table.column > 2) | (table.column2 < 3)].reset_index(drop=True)

when I want to subset a table.

[+] qudat|4 years ago|reply

My wife is a researcher and started delving into doing her own statistical analysis. It's be fun (and frustrating) learning R with her. I agree that dplyr and the tidyverse are some fantastic packges for a software engineer who thinks about spreadsheets as SQL tables.

I would say the most frustrating part about RStudio is that it is a workbook where you can execute code based on your cursor. For my wife, these workbooks become a total mess because things aren't necessarily run sequentially.

[+] unknown|4 years ago|reply

[deleted]

[+] hadley|4 years ago|reply

I love the phrase "eviscerate a fresh data set" :D

[+] stadeschuldt|4 years ago|reply

Great stuff. Could you maybe share a bit of these "3 dozen" scripts? This could be super helpful.

[+] ekianjo|4 years ago|reply

Looks like your notebooks focus on crypto currencies :-) Good use case for data analysis!

[+] crispyambulance|4 years ago|reply

I was expecting a rant, but the OP's article is actually very thoughtful. He definitely knows what he's talking about.

The thing about R, for me and many others, is that it's very much an everyday grind language. Especially with Rstudio, its natural domain is as one of "notebook" languages like python, julia, matlab, and mathematica but with a more clear focus towards the tasks of data-analysis. I just tell the BI-tool people that R is excel on 'roids.

R frustrates me a lot, however. But I think the frustration comes out of the fact that when I am using R and get stuck, I am always in the middle of doing something that I need to get done and I don't feel like diving into a long "vignette". Moreover, the documentation is usually too terse and generalized for me to just understand it immediately. Even though I've been using R for years (albeit in fits and starts rather than continuously), there are things about it that I've just never picked up-- I just DON'T KNOW (or care) what F S3 and S4 mean. Unlike the OP, who clearly knows more R than myself, I grit my teeth when I am looking at docs and see the "..." in the arg list.

I suspect that this is part of the heritage from R's beginnings. I once tried to read John Chamber's book but found the presentation complete ass-backwards and impractical for my immediate needs. The Tidyverse has been great, it's far more consistent and ggplot is a kick-ass tool to have in your box. The drawback is that it makes Base-R seem really alien and if you want to be good at R, you have to know more than just the Tidyverse, IMHO.

[+] epistasis|4 years ago|reply

The tidyverse docs are the only ones with the super frustrating ... of impenetrable gnostic "documentation" that I know of. In general the tidyverse documentation is horrible, almost as bad as typical Python docs, IMHO. Other parts of base R are wonderfully documented in my opinion.

[+] clove|4 years ago|reply

I've used R for 19 years and do not have any other programming language ability. I am curious: What is frustrating to you about R relative to other languages?

[+] indeed30|4 years ago|reply

I love R more than any other language I have ever used. Perhaps more than any piece of software I've ever used. All of these points are valid, and yes, it's messy, and if you try to write the same type of code that you would in Python, it will frustrate you.

And yet.. it somehow works. It makes data analysis and statistical modelling a pleasure. It somehow gives off a sense of lightness, and makes it easy to investigate and explore. I would guess I am genuinely 2x as productive in R as I would be in Python on similar tasks.

I know it's not a "proper" language, but I think that, maybe, not everything has to be exactly like "proper" software engineering?

[+] azalemeth|4 years ago|reply

I very much agree with this. I use python for (different types of) data analysis too, and in python in particular it feels like the "boilerplate" to "science" ratio is rather high in the direction of "boilerplate". R manages to abstract this away very effectively, as the article highlights.

The beauty of R is that you can write one line of code and use some hot-off-the-PhD-thesis cutting-edge-just-published-in-J.-Stat.-Soft-chunk of statistical analysis in your totally different, completely whacky problem, and it's fast, and (by and large) works.

Of course, that's its biggest problem as well. Scientifically, it will quite happily give you a 150 mm howitzer to aim at your foot, assuming you know best.

[+] horsawlarway|4 years ago|reply

Coming from Matlab, I have the opposite feeling.

I truly, genuinely dislike the language. I think it's very productive, and I appreciate that Matlab costs an arm and a leg (and god help you once you start paying for some of the nicer packages on top) - but Matlab has spoiled me immensely on the language front.

To me, Matlab feels like a language that was designed with an intent to appeal to folks with some understanding of traditional procedural programming, but nudged into treating matrices as first class citizens.

R feels like a language that was built for people who were using excel, and have never written a line of code in their life - it's riddled with completely unintuitive, frustrating, intentionally obtuse operators and terms for things that have perfectly fine definitions in normal programming.

The difference is that I have 20+ years of programming experience (including quite a bit of functional programming) that I can easily port over to Matlab, and which becomes literal baggage trying to use R. The end result is that I will use R, but I basically always walk away frustrated and infuriated, even when the problem is solved.

[+] hadley|4 years ago|reply

100% this :)

[+] Gatsky|4 years ago|reply

Yeah, and the growing user base, widening ecosystem, and continual stream of analysis packages being written only in R suggests that lots of others agree.

An important factor not often mentioned is that I think R really helps individual developers/very small teams to be productive.

[+] Breza|4 years ago|reply

I feel the exact same way! I've used R for the past decade. Once you learn the philosophy behind it, it just works. Yesterday my boss asked me a question about a dataset and I wrote code to analyze it while talking through the problem in real time.

[+] ourlordcaffeine|4 years ago|reply

The main issue I've had is speed. As soon as you have problems that can't be vectorized, models that take 30 hours to run in R take 30 minutes in python.

[+] SubiculumCode|4 years ago|reply

Thanks for expressing how I feel about R so succinctly.

[+] em500|4 years ago|reply

The common trope with R is that statisticians and love it and developers hate it.

The the main reason that statisticians love it is that the libraries useful to them are much better in R than elsewhere (though Python keeps encroaching in that turf, and "real developers" dislike Python a lot less than they do R).

The main reasons that developers hate it is that it is very unlike almost all other languages that they're used to. This is very valid since outside the narrow domain of statistics, there's probably nothing that R does better than other languages. So for a dev who occasionally dabbles with R by necessity, the otherness serves nothing but frustration.

Still, I wonder how much criticism there is against R as a programming language, that is not some variation on this works very differently from other languages. IMHO the sub-setting syntax, and countless x-apply variations are big warts. I'm not a big fan of Tidyverse, and even less of the schism between base- and Tidy-R. I read some seemingly fundamental criticism about R's deficient scoping rules, but I'm not nearly knowledgeable enough to judge their merits.

I guess it doesn't help that almost nobody learned R as their first computer (as opposed to statistics) language. Personally, I learned C, Matlab, Python/numpy, SQL, R in that order. R does seem to be quirkier than all the others, except maybe SQL. But I don't dislike working in R any more than working in any other language.

[+] yarky|4 years ago|reply

> it doesn't help that almost nobody learned R as their first computer (as opposed to statistics) language.

Aside from two statisticians I had as professors, I am yet to meet someone with deep understanding of statistics who doesn't speak R as first language ...

I found it way easier to grasp the meaning of statistics by playing with R than by reading the maths.

[+] pantulis|4 years ago|reply

I had some Matlab experience about 3 decades ago. What's you take in Matlab vs R as programming languages?

[+] twomoonsbysurf|4 years ago|reply

[deleted]

[+] mellavora|4 years ago|reply

> "real developers" dislike Python a lot less than they do R

I thought that was real Scottmen.

Because real Scottsmen prefer:

table.loc[(table.column > 2) | (table.column2 < 3)].reset_index(drop=True)

to

table[column > 2 & column2 < 3, ]

and everyone knows this!

not to mention, if you aren't managing 100 virtual environments and 100 conda environments (with different syntax for requirements), you aren't a real scottsman!

[+] dash2|4 years ago|reply

I think this is really interesting. The author certainly isn't an expert, for example `result[which(result < 0.5)] <- 0` is a mistake for `result[result < 0.5] <- 0`.

But that's just why it's useful - R is great when you are an expert, but becoming an expert takes years. The perspective of new users is really important. (I've been using R almost 20 years, have written several packages, and still feel like an amateur. Indeed, I'd never heard of `**` as an alias for `^` until today; nor `sequence`, which apparently has always been in base; and I still can't remember what `sweep` does.)

I thought some of these arguments were better than others. True that base R regex is confusing and messy (and that stringi/stringr are improvements). False that allowing string concatenation with `+` would be a good idea. That's just a footgun waiting to go off, given that R also is weakly typed. Expecting `nchar(1000)` to magically work seems naïve. `<<-` (roughly, global assignment) is an ugly necessity and a code smell, not a cool language feature.

An awful lot of these problems are fixed, or try to be fixed, in the tidyverse. Not using tidyverse is a bit unusual because most beginners nowadays, I think, start with the tidyverse more than with base R.

For me the worst part of R is simply it fails silently. This is really deadly, especially when you are producing scientific results. There are so many places where R will plug gamely on after you have done something deeply inappropriate. Given how badly scientists code, one has to worry.

I don't agree that "R won’t change" is the base problem. It's not so simple. R is used for science. I like very much that my code from 2008 will probably still work if someone wants to replicate my results. I appreciate the R-core team's work in making this true. There are genuine trade-offs here.

If you want emotional relief, it's worth following https://twitter.com/whydoesR.

Maybe Julia is the way forward? Or is R "worse is better"?

[+] bluenose69|4 years ago|reply

> Maybe Julia is the way forward?

Julia is well worth learning, if you do computationally-expensive work. It is kind of a pain to use interactively, though. I use both R and Julia in my research. Think of Julia as the new Fortran, though, not the new R.

[+] dm319|4 years ago|reply

I think a lot of the problem with comparing R to other languages is that a lot people don't get the problem space that R is working in. Science deals a lot with categorical variables, missing data and high dimensional data, and the 'table' or 'dataframe' is adept at storing and working with this information. Under the hood it's just a load of optimised fortran code working matrices, but the code clearly shows what kinds of data manipulations and transformation you are doing to eek the right information and visualisations out from the dataset.

I see problems when people take an imperative approach to solving numerical problems, and something like Python is better suited to that. Also, R isn't really set up to work with matrices like Matlab/Julia are.

[+] toto444|4 years ago|reply

The points the author mentions are fair but something feels amiss. I have used R heavily and still use it from time to time and I never use most of the functions mentioned in the post. For instance I have never used switch().

R is for data manipulation. 90% of what I do in R is manipulate dataframes or matrices and then run machinelearningmodel(mydataframe) or ggplot(mydataframe). And for this it is incredibly efficient. You can rightly argue that some elements of the language are quirky but that's missing the point.

> Asked over 100 Stack Overflow R questions.

As a tangent I find a hundred questions asked the first year for a very mature language is a lot.

[+] dm319|4 years ago|reply

Yes, I think if you are using switch() for an analysis in R, you're either using the wrong language, or you're doing R wrong.

[+] uniqueuid|4 years ago|reply

What always surprises me is how many people make beautiful, lovingly-crafted band-aids to the language's warts and problems. Not just code + packages but social band-aids too.

In a way, you could argue that the entire tidyverse is a huge effort of a band-aid.

So for all the irritating design choices and idiosyncracies, R is still a network of islands that work incredibly well for people, as long as they don't ever go to sea.

[+] CornCobs|4 years ago|reply

I've written an interpreter for R (a subset; it was for school and I left out some features like S4 and the condition system) so I have done some pretty deep dive into the language reference and GNU R source.

I agree with the author's sentiment - I love a lot of what R has, but there is a lot of small madnesses.

There are so many unique PL ideas in R (may not actually be unique but certainly unique among common languages today)

- first class environments - named, default parameters and even the ... parameter which encourages the pattern of hierarchical library functions - there's one large customizable main workhorse function, and many wrapper functions that specify some defaults or add some behavior, but all the underlying customizations are exposed through ... - copy on write as a default - ability to choose evaluation strategy

But I also wonder how many of these cool ideas would actually work well in a saner language

[+] tharne|4 years ago|reply

R is a truly terrible language with a handful of bright spots, such as it's visualization libraries.

The boost you get from the slightly better expressiveness of R over something like Julia or Python is not worth the headaches you'll run into down the road in trying to maintain whatever you wrote 6 months later, or God forbid, trying to integrate your code into someone else's work.

R was my first language and in hindsight that was a HUGE mistake. So much of the R code out there is horribly written, and even when it isn't you still have to deal with all of the issues the author here points out. If you pick up R as your first language, you will end up picking up all sorts of bad habits;

R is fine if you're working solo and you don't plan on maintaining or reusing or reusing your code. For everything else, R is garbage. It took me a year or more to undo all of the bad habits I picked up learning R.

I don't agree with the "worse is better" comparison in the comments here. "Worse is Better" was meant to refer to the idea of "Don't make the perfect the enemy of the good", among other things. It was not meant to be used as a justification for poor design. If anything, python for data analysis fits the "worse is better" philosophy much better than R. It's not as well optimized for data work compared to R, but it's much simpler, more consistent, less error prone, and it plays well with others.

[+] stewbrew|4 years ago|reply

> R has two types of empty string: character(0) and "".

I understand it's frustrating trying to use a language you don't understand. And instead of reading the language manual you go on rambling.

"" is an empty string (almost) as you know it from other languages.

character(0) is an empty vector of type character (i.e. a vector with no elements). This vector doesn't even contain an empty string.

R is a vectorized language. You almost always deal with vectors. "" actually is a character(1), a character vector of length 1. Once you understand this, there is a chance for you to enjoy R.

[+] chubot|4 years ago|reply

The popularity of the Tidyverse is a major blow to your motivation to learn R. Why would anyone want to learn a language that is treated as secondary to some packages? Worse still, if that turns out to be the best way to use R, then you’re forced to admit that R is a polished turd with a fragmented community.

As others have mentioned, just use tidyverse. I picked it up 4 years ago, and last week I went back to the code I wrote then.

I was productive in minutes. I could read the code, modify it, and easily test it in the REPL. The docs for dplyr are good.

ggplot2 is still awesome and the docs are good there too. ggplot2 is the fastest way to figure out what you want and make a pretty plot.

(However one thing that still annoys me is that R moves faster than Debian. So it's possible to do install.packages() in R, and it will break telling you your Debian R interpreter is too old. There is no easy solution for this, just a bunch of workarounds)

-----

OK, sure you can call it a polished turd, and to some degree that's true. But a polished turd is better than just using ... a turd!

The error messages in R are not quite as good as Python, but I wouldn't call it a problem. I'm able to localize the source of an error, even when using tidyverse.

My article comparing tidyverse to some other solutions:

What Is a Data Frame? (In Python, R, and SQL) http://www.oilshell.org/blog/2018/11/30.html

----

But would I recommend learning it to anyone else? Absolutely not. We can do so much better.

I would recommend with the caveat that it's one of the hardest languages I've had to learn. However that is partly because it changes how you think. But if you have a certain type of problem then you have to change how you think, or you'll never get it done. Data analysis is surprisingly laborious even for people who have say written compilers and such.

[+] dxbydt|4 years ago|reply

Lot of useful insights in the comments here. I wanted to address one specific comment -

>can't remember the last time I saw a project someone did in R get very much traction anywhere...the only time people talk about R on the internet is to discuss the language itself which is definitely frustrating

There is a lot of R deployed in industry, even in silicon valley, but you have to be in-the-know. R gets plenty of use in statarb & model checking in finance - speaking from personal experience at GS & BofA/ML. My one non-trivial project at Twitter involved working with this team building a model & I remarked - hey this can be done rather easily if you use this library in R - and the teamlead says, yeah that's how we're doing it! But I thought we are a Scala shop, I said. So he says, yeah but imagine building that entire library in Scala from scratch, it'll take forever! So I enquired how he gets it done - you basically spin up a socket server & the jvm sends R commands plus data as payload over the socket, the server runs R and returns the result of the model back as a string, boom done! I said it was kinda janky & he says - I won't tell if you don't ! So that's R for you - it gets the job done & its fast & somewhat messy, but it is used everywhere, yet people won't openly admit to it because its a 30 year old language & we all want to be using the latest & greatest tool.

I now work at a news startup with a few million users, & all of the news personalization is done in R. So when these millions of viewers watch TV, the piece of code that decides which news clip should be shown ahead of which other news clip & which clip comes after - all of that is decided by a block of R code that I wrote. ~ 300 lines of R, uses quanteda, tidytext & parallel under the hood. Pretty much everything I do involves mcmapply, which parallelizes your compute & uses as many cores as you specify. But that's sort of the thing with R - you have to know which functions/libs to use & which ones to avoid. Just switching from tm to quanteda got us a 200% bump in perf. Switching sapply's to mcmapply was another winner. These things aren't documented cleanly - you have to keep up with cran, experiment & see what works best for you.

[+] CapmCrackaWaka|4 years ago|reply

I would say about 90% of the posts / articles / comments I see on the internet which discuss R are usually of the "meta" format. They talk about R's strengths or weaknesses, about the difference between R and Python, about how much they love or hate R, or any other high level subject.

I can't remember the last time I saw a project someone did in R, or a tutorial on how to do something in R, get very much traction anywhere. It seems the only time people talk about R on the internet is to discuss the language itself (which is definitely frustrating), and it's getting old. Even this awesome, comprehensive document, which I would usually be foaming at the mouth to read, has me going "meh". I'm tired of the subject.

[+] mellavora|4 years ago|reply

> I can't remember the last time I saw a project someone did in R, or a tutorial on how to do something in R, get very much traction anywhere.

well, you know, I'm not very active in C++ any more, and I haven't seen an article in over a decade on C++ which received any traction at all. So I guess C++ isn't getting any traction any more either.

[+] hadley|4 years ago|reply

They don't often come with code, but one of my recent sources of R programming joy is the folks posting their generative art to twitter: https://twitter.com/search?q=%23rstats%20%23generativeart&sr...

[+] disgruntledphd2|4 years ago|reply

This is definitely true on HN, at least. I think the vast majority of R-users are just plugging away on their domain specific problems daily, and tend not to participate in these conversations.

Dark-matter statisticians, I guess?

[+] ggrothendieck|4 years ago|reply

What needs to be added is that before R the reproducibility problem in science was compounded by the fact that analyses were done with proprietary software limiting communication and replication of those analyses. This was and continues to be a major problem, particular in some fields, but at least now there is a common widely used language that can be used to overcome this. I wouldn't focus on idiosyncrasies but rather on the major problem it addresses. Any large system will grow over time and have some inconsistencies but after a while you learn the workarounds so they are less important than the big picture.

[+] tgb|4 years ago|reply

On the contrary, R packaging system is too broken for R to be reliably reproducible. No one specifies package versions or R versions. Base R has no way to install a specific version of a package. There’s a package that lets you do that, but well, you might need a specific version of it. Particularly if you need to run an old version of R for reproducing an old script it may be impossible to use any standard tool to install the correct packages thanks to this problem - the version of devtools that install.packages gets won’t be compatible with your old R but you need that package to request another version. Instead everyone just ignores it and hopes package versions don’t matter.

[+] _Wintermute|4 years ago|reply

I don't see how R specifically addresses the reproducibility problem, It's been around for almost 30 years and before its recent rise in popularity, lots of science was done in C, perl, fortran etc. Not to mention that actual dependency versioning is pretty poor. I struggle to run other people's R code after about 6 months (especially if they used the tidyverse as it pulls in hundreds of unstable dependencies) and nobody records what package versions are used and functions are seemingly deprecated every week.

[+] AuthorizedCust|4 years ago|reply

I don’t like this. Much of this is:

1. pointing out that, like every other language, base R has idiosyncrasies

2. how use of R is more complex when you’re largely ignorant of the tidyverse, which is crucial for the vast majority of tissue today’s use of R

3. frustration because you’re using a language/ecosystem, that’s targeted for a few specific uses, as a general purpose programming language

[+] kgwgk|4 years ago|reply

It's a good reading but some of the complaints are hard to understand.

For example, in 4.5.1:

  Selecting and deleting at the same time doesn’t work either. For example, data[c(-1, 5)] is an error.

What would it mean for that to work? He seems to acknowledge that "selecting and deleting at the same time" doesn't make sense in 4.11.1

  Can you guess what data[-1:5] returns? I can’t either, so don’t ever try it. If you must know, it’s actually an error.

Also in 4.11.1:

  The : operator is absolutely lovely… until it screws you. The solution is to prefer the seq() functions to using : [....] As I’ve said, seq() and its related functions usually fix this issue.

Maybe the "related functions" fix some issues but seq(a,b) is not different from a:b

In 4.11:

  Now what do you think names(foo) <- names(bar) does? Seriously, can you guess? I can think of roughly four realistic guesses. Is it even valid syntax?

How is that surprising? Can the author also think of four realistic guesses about the effect of A[1,2] <- B[3,4] for example?

In 4.13:

  The index in a for loop uses the same environment as its caller, so loops like for(i in 1:10) will overwrite any variable called i in the parent environment and set it to 10 when the loop finishes. [...] This sounds awful, but I’ve never encountered it in practice.

Is it awful? The same happens in other languages like Python or C if I'm not mistaken.

  The plot() function has some strange defaults. For example, you need to have a plot before you can plot points [...]

I have no idea what that means. You can plot points using plot() without having a plot beforehand.

Edited to add: In 4.5.3:

  The $ operator is another case of R quietly changing your data structures.

Is it unexpected that when we extract an element from a data structure we get a different kind of data structure? Is A[1,1] another example of silently changing one data structure (matrix) to another (number)?

[+] kgwgk|4 years ago|reply

> Here’s a challenge: Find the function that checks if "es" is in "test". You’ll be on for a while.

grepl(“es”, “test”)

[+] saeranv|4 years ago|reply

Personally, I feel like the biggest problems with Python for math is the lack of a native vector datatype and our subsequent reliance on Numpy which really disrupts the elegance/terseness of working with vectors/matrices.

First, there's the constant inelegance/clutter/inefficiency of having to cast into and out of arrays and lists, even when doing basic list comprehension. R, Julia, and Matlab are all vector-based languages (I think), so you avoid having this casting as much.

Secondly, having a native vector type means you don't have to worry about the performance penalty of operating directly on arrays if an existing prebuilt method exists. Since the efficiency of Numpy comes from calling it's underlying C library, you're forced to memorize and use prebuilt Numpy functions rather then just use the more obvious and elegant array manipulations. For example rather than calculating the the cumulative sum like this:

  cumsum = reduce(lambda a, b: a+b, [1,2,3,4]) 

 We have do this: 

  cumsum = np.cumsum([1,2,3,4])

(There are better examples, but this is all I can think of right now).

And once you add something like Pytorch tensors on top of this, we now have an additionally layer of casting/redunduncy/memorization of prebuilt functions!

[+] VariableStar|4 years ago|reply

Excellent read. I agree with a lot (after only a cursory read). One thing the author seemingly forgives R by not mentioning is how harsh and discouraging of beginners the community was at some early stage. That was my experience around 2002-2006.

[+] bluenose69|4 years ago|reply

R is designed for data analysis, not for general computing. Its syntax differs from that of other systems. Python's syntax also differs from other systems. Same for Matlab. And so on.

Non-uniformity imposes a burden that will be too much to bear, unless the system offers particular advantages. The fact that several systems co-exist is proof that the advantage-burden balance is favourable in each case.

There is no need to converge on a single tool. Carpenters need both saws and hammers.

In practical applications, language syntax is just part of the story. One must also consider the issue of available libraries. One thing that really stands out with R is its immense collection of well-vetted and well-documented packages. Python and Matlab -- the two main alternatives in my discipline -- fall far behind R in this respect. If there's a journal article on a new statistical technique, then there's a pretty good chance of a package written by same author. And, if that package is on CRAN (the repository for such things) then it has undergone quite rigorous testing on several types of computer, with several versions of R.

258 comments