(no title)
klmr | 2 years ago
Oh, it’s very common unless you basically only use < 5 packages that are completely stable and no longer actively developed: packages break backwards compatibility all the time, in small and in big ways, and version pinning in R categorically does not work as well as in Python, despite all the issues with the latter. People joke about the complex packaging ecosystem in Python but at least there is such a thing. R has no equivalent. In Python, if you have a versioned lockfile, anybody can redeploy your code unless a system dependency broke. In R, even with an ‘renv’ lockfile, installing the correct packages version is a crapshoot, and will frequently fail. Don’t get me wrong, ‘renv’ has made things much better (and ‘rig’ and PPM also help in small but important ways). But it’s still dire. At work we are facing these issues every other week on some code base.
hadley|2 years ago
klmr|2 years ago
I think that most problems are ultimately caused by the fact that R packages cannot really declare versioned dependencies (most packages only declare `>=` dependency, even though they could also give upper bounds [1]; and that is woefully insufficient), and installing a package’s dependencies will (almost?) always install the latest versions, which may be incompatible with other packages. But at any rate ‘renv’ currently seems to ignore upper bounds: e.g. if I specify `Imports: dplyr (>= 0.8), dplyr (< 1.0)` it will blithely install v1.1.3.
The single one thing that causes most issues for us at work is a binary package compilation issue: the `configure` file for ‘httpuv’ clashes with our environment configuration, which is based on Gentoo Prefix and environment modules. Even though the `configure` file doesn’t hard-code any paths, it consistently finds the wrong paths for some system dependencies (including autotools). According to the system administrators of our compute cluster this is a bug in ‘httpuv’ (I don’t understand the details, and the configuration files look superficially correct to me, but I haven’t tried debugging them in detail, due to their complexity). But even if it were fixed, the issue would obviously persist for ‘renv’ projects requiring old versions.
(We are in the process of introducing a shared ‘renv’ package cache; once that’s done, the particular issue with ‘httpuv’ will be alleviated, since we can manually add precompiled versions of ‘httpuv’, built using our workaround, to that cache.)
Another issue is that ‘renv’ attempts to infer dependencies rather than having the user declare them explicitly (a la pyproject.toml dependencies), and this is inherently error-prone. I know this behaviour can be changed via `settings$snapshot.type("explicit")` but I think some of the issues we’re having are exacerbated by this default, since `renv::status()` doesn’t show which ones are direct and which are transitive dependencies.
Lastly, we’ve had to deactivate ‘renv’ sandboxing since our default library is rather beefy and resides on NFS, and initialising the sandbox makes loading ‘renv’ projects prohibitively slow — every R start takes well over a minute. Of course this is really a configuration issue: as far as I am concerned, the default R library should only include base and recommended packages. But it in my experience it is incredibly common for shared compute environments to push lots of packages into the default library. :-(
---
[1] R-exts: “A package or ‘R’ can appear more than once in the ‘Depends’ field, for example to give upper and lower bounds on acceptable versions.”
apwheele|2 years ago
For those whom want to use both R/python, I have notes on using conda for R environments, https://andrewpwheeler.com/2022/04/08/managing-r-environment....
disgruntledphd2|2 years ago
It's a bit of faff but that seems like it should work (but maybe I'm missing something).
getoffmycase|2 years ago
wodenokoto|2 years ago
hadley|2 years ago