top | item 15089015

Debian reproducibility statistics

188 points| lamby | 8 years ago |tests.reproducible-builds.org | reply

70 comments

order
[+] YouKnowBetter|8 years ago|reply
On a side note: A point that many a company ignores when they demand some assurance so their software vendor a) can be held liable b) will hand over the source in case of drama, is to give the source code in Escrow. The number one error these companies make is to think that source code without a reproducible build environment means anything at all. Reproducibility is not easy. I applaud Debian (and in the older days TrueCrypt) for giving this more exposure.
[+] bluGill|8 years ago|reply
Someplace my company has a closet with a computer containing Windows XP (not sure which service pack, but not the latest), and and old version of visual studio, just in case we need to fix a bug. I know we still have it because the project got resurrected recently and we had to clone the harddrive a few times because nobody could find a copy of that version of visual studio that can be installed. (Microsoft dropped support for whatever version of WinCE with no upgrade path)
[+] bialpio|8 years ago|reply
Can someone shed some light on what exactly "reproducible" means in this context?
[+] dozzie|8 years ago|reply
It's a term for binaries (usually ELF) being byte-to-byte equal in two different runs of a compiler. This way you can build a binary package from source package and if its content is the same, you know what source code was used to build the package, and then you can e.g. inspect the code for backdoors or build debug symbols without planning for that beforehand.

https://en.wikipedia.org/wiki/Reproducible_build

[+] vegbrasil|8 years ago|reply
Is Docker (or any other container platform) a facilitator to reproducible builds? Making the environment standard between builds is probably easier in a container.
[+] samueloph|8 years ago|reply
Debian builders already run on sbuild[1], which uses chroot, so it's already a contained environment.

I'm not sure how the reproducible builds machines works for forcing two different build results, but i'd bet they're using sbuild too.

Here's a diffoscope of one of my packages which is not reproducible right now[2].

I also highly recommend the reproducible builds talk at this year's debconf with the repro build folks[3].

[1]https://wiki.debian.org/sbuild

[2]https://tests.reproducible-builds.org/debian/rb-pkg/unstable...

[3]https://debconf17.debconf.org/talks/14/

[+] 0xcde4c3db|8 years ago|reply
Docker is part of a broader "reproducible build environment" strategy, but doesn't really help with some of the things that cause problems (timestamps, kernel version, random IDs).
[+] georgyo|8 years ago|reply
Builds have been happening in single use chroots for a very long time.

The problem is if code includes the current date, or generates some randomness at build time. These issues must be identified and patched.

[+] zdw|8 years ago|reply
Docker images built with the first-party toolchain aren't reproducible - if you run `docker build ...` on a Dockerfile, then delete the image and rerun it, you'll get a different set of image hashes. This is likely due to timestamp embedding.

There are other toolsets that supposedly create byte-identical Docker images generation (Bazel, some others), but I haven't tried them.

[+] entelechy|8 years ago|reply
I wonder if switching from cmake to buck[1] and buckaroo[2] would simplify and increase reproducibility.

[1] https://buckbuild.com

[2] https://buckaroo.pm

[+] guipsp|8 years ago|reply
^CEO of LoopPerfect, which maintains Buckaroo

Might want to disclose that.

[+] sfrigon|8 years ago|reply
Is there any known problems / pitfalls regarding cmake and reproducible builds? Just curious to know.
[+] geofft|8 years ago|reply
Well, this is Debian - they generally don't maintain the software, they just package it. Small changes can be pushed upstream, but "Here's a brand new build system that we promise is super cool" is generally not a patch that people like taking :)
[+] moosingin3space|8 years ago|reply
Based on the experiences of systemd, gtk, and X11 in switching to Meson, I'd think Meson might be the best choice here. While Buck, Bazel, Pants, etc. are designed for large projects, Meson is designed for small-to-medium-size projects and integrates with pkg-config, which in turn should provide simpler integration with distribution package managers. My experiences with Bazel demonstrated that integration with distribution packages can be quite difficult.
[+] pavlov|8 years ago|reply
The lack of a .yet TLD is a real missed opportunity.
[+] penpapersw|8 years ago|reply
Meh, joke sites like this aren't nearly as prominent as sites for apps, I'd really like a .app TLD instead. That said, the distinction between app and service is blurring a lot, so Spotify and VS Code probably both qualify for .app but one also has a web interface. Everything is confusing, let's just stick to .com
[+] lamby|8 years ago|reply
Mods, how come my post was completely edited? :)
[+] dang|8 years ago|reply
The is-foo-bar-yet.com -> NO/YES trope has been off topic on HN for years, mainly because such pages are unsubstantive but also because it's long been a cliché.

If we can find a more substantive page to link to on the same topic, we'll sometimes change the URL. Usually we post a comment explaining that we did so, but it depends who's on duty at the time.

[+] hmmm___food|8 years ago|reply
Maybe im missing something but it seems the justification behind this is based on a situation in which source wasnt open. Debian is open so why is reproducibility a priority?
[+] yakubin|8 years ago|reply
On the contrary. Reproducibility means that given the source you are able to produce the exact same binary, i.e. you can verify that someone didn't modify it before building. When software isn't open, then reproducibility is meaningless, because you don't have the source, so: 1. you can't verify anything (you can't compile anything); 2. assurance of the binary not being modified gives you nothing, since you don't know if there is no malicious code in the original source code.
[+] thechao|8 years ago|reply
I know Debian has lofty goals with respect to reproducibility, but on a purely "I hate waiting for builds", getting deterministic object files can prevent spurious linking. For example: change a comment, watch your code link for 10 minutes.
[+] stordoff|8 years ago|reply
Debian being open only gives you any sort of assurance if you can prove the binary you are using is compiled from that source. Without reproducible builds, you can only (easily) do that if you are building the source yourself (which obviously most people don't).

Reproducibility doesn't really make much sense when the code _isn't_ open - knowing that unknown source code can reliably produce the same output isn't that valuable.

[+] wiz21c|8 years ago|reply
Just to give an example, think about some software you put in a voting machine. Even if the code is open source and given to voters, they need to be able to compile it themselves and compare their compiled version to the one in the voting machine... (well, to be honest, this only makes sense if the voting machine runs the compiled code only and doesn't sneak other stuff in at runtime, but that's another story)
[+] kalmi10|8 years ago|reply
So that anyone can check whether a binary (on the mirrors) has been tampered with in ways that are not present in the source.