top | item 39591099

(no title)

klmr | 2 years ago

Oh, I totally agree that ‘renv’ probably solves 95% of problems. But those pesky 5%…

I think that most problems are ultimately caused by the fact that R packages cannot really declare versioned dependencies (most packages only declare `>=` dependency, even though they could also give upper bounds [1]; and that is woefully insufficient), and installing a package’s dependencies will (almost?) always install the latest versions, which may be incompatible with other packages. But at any rate ‘renv’ currently seems to ignore upper bounds: e.g. if I specify `Imports: dplyr (>= 0.8), dplyr (< 1.0)` it will blithely install v1.1.3.

The single one thing that causes most issues for us at work is a binary package compilation issue: the `configure` file for ‘httpuv’ clashes with our environment configuration, which is based on Gentoo Prefix and environment modules. Even though the `configure` file doesn’t hard-code any paths, it consistently finds the wrong paths for some system dependencies (including autotools). According to the system administrators of our compute cluster this is a bug in ‘httpuv’ (I don’t understand the details, and the configuration files look superficially correct to me, but I haven’t tried debugging them in detail, due to their complexity). But even if it were fixed, the issue would obviously persist for ‘renv’ projects requiring old versions.

(We are in the process of introducing a shared ‘renv’ package cache; once that’s done, the particular issue with ‘httpuv’ will be alleviated, since we can manually add precompiled versions of ‘httpuv’, built using our workaround, to that cache.)

Another issue is that ‘renv’ attempts to infer dependencies rather than having the user declare them explicitly (a la pyproject.toml dependencies), and this is inherently error-prone. I know this behaviour can be changed via `settings$snapshot.type("explicit")` but I think some of the issues we’re having are exacerbated by this default, since `renv::status()` doesn’t show which ones are direct and which are transitive dependencies.

Lastly, we’ve had to deactivate ‘renv’ sandboxing since our default library is rather beefy and resides on NFS, and initialising the sandbox makes loading ‘renv’ projects prohibitively slow — every R start takes well over a minute. Of course this is really a configuration issue: as far as I am concerned, the default R library should only include base and recommended packages. But it in my experience it is incredibly common for shared compute environments to push lots of packages into the default library. :-(

---

[1] R-exts: “A package or ‘R’ can appear more than once in the ‘Depends’ field, for example to give upper and lower bounds on acceptable versions.”

discuss

order

No comments yet.