top | item 31037708

(no title)

nyoomboom | 3 years ago

How does cargo prevent reproducible builds?

discuss

enriquto|3 years ago

The default behavior of cargo is to download stuff from the internet. This may be the least reproducible thing ever.

I'm honestly astonished that programmers of a language that is deemed to be "safe by default" thought that this behavior was acceptable in any form, not to say the default. If downloading things at build time is somehow necessary, it should be an obscure option behind a flag with a scary name, like --extremely-unsafe-i-know-what-i-am-doing, that prompted the user with a small turing test every time that it is run. Cargo is just bonkers, it doesn't matter at all if it is "convenient" or not. Convenience before basic safety and reproducibility is contrary to the spirit of the language itself.

It's as if bounds checking in the language was deferred to a third party that you need to "trust" in order to believe that you won't have segmentation faults.

thecrm|3 years ago

It doesn't just download random things. Cargo generates a Cargo.lock file with checksums and will make sure that those checksums match when building later on. It's about as safe as vendoring all dependencies while being far easier to work with (though tools like cargo-vendor do exist, of course).

Edit: for things like the kernel, vendoring dependencies is still probably not a bad idea, of course

roca|3 years ago

If your project has a Cargo.lock file checked into its repo, then everyone checking that out will download the same code for all dependencies (unless someone manages to compromise the crates.io package archive). That is very far from "the least reproducible thing ever".

easytiger|3 years ago

> The default behavior of cargo is to download stuff from the internet. This may be the least reproducible thing ever.

Wait till you find out about java ecosystems

I know investment bank dev teams pulling whatever they need from maven central with no oversight or introspection.

KronisLV|3 years ago

> The default behavior of cargo is to download stuff from the internet.

This is borderline inevitable for most modern development stacks, though .lock files can definitely help, even adding hashes to check against if you care about your dependencies being the same as when you first download/add them to the project and/or inspect the code.

As for worries about the things in those URLs disappearing, in most cases you should be using a proxy repository of some sort, which i've seen leveraged often in enterprise environments - something like JFrog Artifactory or Sonatype Nexus, with repositories either globally, or on a per-project basis.

The problem here is that all of these repositories kind of suck and that the ecosystem around them also does:

  - for example, Nexus routinely fails to remove all of the proxied container images and their blobs that are older than a certain date, bloating disk space usage
  - when proxying npm, Nexus needs additional reverse proxy configuration, since URL encoded slashes aren't typically allowed
  - many popular formats, like Composer (or plenty more niche ones) are only community supported https://help.sonatype.com/repomanager3/nexus-repository-administration/formats (nobody will ever cover *all* of the formats you need, unless you limit yourself to very popular stacks)
  - many of the tech stacks that have .lock files may also include URLs to the registry/repository from which they're acquired, so some patching might be necessary
  - in technologies like Ruby, actually setting up the proxy isn't as easy as running something like "bundle install --registry=..." as it is in npm
  - in other technologies, like Java, you get into the whole SNAPSHOT vs RELEASE issue and even setting up publishing your own packages to something like Nexus can be a bit of work; the lack of proper code libraries for reuse and abundance of code being copy-pasted that i've been being a proof of this in my mind

Of course, i'm mentioning various tech stacks here and i don't doubt that in the long term Rust and other technologies might also address their own individual shortcomings, but my point is that dependency management is just a hard problem in general.

So, for most people the approach that they'll take is to just install stuff from the Internet that other people trust and just hope that the toolchain works as expected, a black box of sorts. I've seen plenty of people just adding packages without auditing 100% of the source code which seems like the inevitable reality when you're just trying to build some software with time/resource constraints.

j-krieger|3 years ago

I'd really like to know where you think C++ dependencies and headers come from.

unknown|3 years ago

[deleted]