top | item 39572679

(no title)

klmr | 2 years ago

> I can’t remember the last time a library incompatibility led to a show stopper.

Oh, it’s very common unless you basically only use < 5 packages that are completely stable and no longer actively developed: packages break backwards compatibility all the time, in small and in big ways, and version pinning in R categorically does not work as well as in Python, despite all the issues with the latter. People joke about the complex packaging ecosystem in Python but at least there is such a thing. R has no equivalent. In Python, if you have a versioned lockfile, anybody can redeploy your code unless a system dependency broke. In R, even with an ‘renv’ lockfile, installing the correct packages version is a crapshoot, and will frequently fail. Don’t get me wrong, ‘renv’ has made things much better (and ‘rig’ and PPM also help in small but important ways). But it’s still dire. At work we are facing these issues every other week on some code base.

discuss

order

hadley|2 years ago

I'd love to hear more about this because from my perspective renv does seem to solve 95% of the challenges the folks face in practice. I wonder what makes your situation different? What are we missing in renv?

klmr|2 years ago

Oh, I totally agree that ‘renv’ probably solves 95% of problems. But those pesky 5%…

I think that most problems are ultimately caused by the fact that R packages cannot really declare versioned dependencies (most packages only declare `>=` dependency, even though they could also give upper bounds [1]; and that is woefully insufficient), and installing a package’s dependencies will (almost?) always install the latest versions, which may be incompatible with other packages. But at any rate ‘renv’ currently seems to ignore upper bounds: e.g. if I specify `Imports: dplyr (>= 0.8), dplyr (< 1.0)` it will blithely install v1.1.3.

The single one thing that causes most issues for us at work is a binary package compilation issue: the `configure` file for ‘httpuv’ clashes with our environment configuration, which is based on Gentoo Prefix and environment modules. Even though the `configure` file doesn’t hard-code any paths, it consistently finds the wrong paths for some system dependencies (including autotools). According to the system administrators of our compute cluster this is a bug in ‘httpuv’ (I don’t understand the details, and the configuration files look superficially correct to me, but I haven’t tried debugging them in detail, due to their complexity). But even if it were fixed, the issue would obviously persist for ‘renv’ projects requiring old versions.

(We are in the process of introducing a shared ‘renv’ package cache; once that’s done, the particular issue with ‘httpuv’ will be alleviated, since we can manually add precompiled versions of ‘httpuv’, built using our workaround, to that cache.)

Another issue is that ‘renv’ attempts to infer dependencies rather than having the user declare them explicitly (a la pyproject.toml dependencies), and this is inherently error-prone. I know this behaviour can be changed via `settings$snapshot.type("explicit")` but I think some of the issues we’re having are exacerbated by this default, since `renv::status()` doesn’t show which ones are direct and which are transitive dependencies.

Lastly, we’ve had to deactivate ‘renv’ sandboxing since our default library is rather beefy and resides on NFS, and initialising the sandbox makes loading ‘renv’ projects prohibitively slow — every R start takes well over a minute. Of course this is really a configuration issue: as far as I am concerned, the default R library should only include base and recommended packages. But it in my experience it is incredibly common for shared compute environments to push lots of packages into the default library. :-(

---

[1] R-exts: “A package or ‘R’ can appear more than once in the ‘Depends’ field, for example to give upper and lower bounds on acceptable versions.”

apwheele|2 years ago

Agree with this, I am pretty agnostic to the pandas vs R whatever stuff (I prefer base R to tidyverse, and I like pandas, but realize I am old and probably not in majority based on comments online). But many teams who are "R adherent" folks I talk to are not deploying software in varying environments so much as reporting shops doing ad-hoc analytics.

For those whom want to use both R/python, I have notes on using conda for R environments, https://andrewpwheeler.com/2022/04/08/managing-r-environment....

disgruntledphd2|2 years ago

Can you not just build your own code as a package and specify exact dependencies?

It's a bit of faff but that seems like it should work (but maybe I'm missing something).

getoffmycase|2 years ago

I basically don’t use anything outside of tidyverse or base R because of the package dependency issues.

wodenokoto|2 years ago

At my old job we snapshotted CRAN and pinned versions of package dependencies _against_ CRAN.