top | item 41234070

(no title)

hlandau | 1 year ago

Personally I've always considered it bad hygiene to commit generated outputs, but this article notes that this takes on a new significance in the light of supply chain security concerns. Good changes from PostgreSQL here.

Generated output, vendored source trees, etc. aren't, or can't be, meaningfully audited as part of a code review process, so they're basically merged without real audit or verification.

My personal preference is never to include generated output in a repository or tarball, including e.g. autoconf/automake scripts. This is directly contrary to the advice of the autotools documentation, which wants people to ship these unauditably gargantuan and obtuse generated scripts as part of tarballs... an approach which created an ideal space for things like the XZ backdoor.

discuss

order

nrabulinski|1 year ago

My take is that they should always be committed, but never generated by the dev, instead generated and pushed when necessary by CI. The problem with generating those files yourself is that, in many cases, it makes the output nondeterministic and nonreproducible. In the ideal world those tools would just generate those files deterministically, but until then for me committing them from CI is an acceptable stopgap

koolba|1 year ago

My preference is to do both. Have them generated by a dev, committed, and also generated in CI. The latter gets compared with the checked in contents to ensure the results match the expected value.

This speeds up CI (the generation path can be done in parallel) and most local development.

The one catch is that it relies on mostly trusting whoever has a commit bit. But if you don’t have that and any part of the build involves scripts that are part of the repo itself, then you’ve already lost.

KptMarchewa|1 year ago

No, they should be generated by either dev or something like pre-commit and then checked if they match what's generated by CI.

And yes, those have to be deterministic with regards to inputs, it does not make sense otherwise.

EuAndreh|1 year ago

No unauditable generated code for me, either manually or automatically, thanks.

bonzini|1 year ago

> an approach which created an ideal space for things like the XZ backdoor.

That's not entirely correct. Indeed there was a part of the xz backdoor that lived in the configure script. However, that part was also included in the sources of the configure script as found in the tarball (and not in the git archive).

Thus regenerating the configure script didn't help, but regenerating the tarball did.

EuAndreh|1 year ago

In this case, I can say autotools's advice is outdated at best, and one shouldn't follow it.

It adds unneeded complexity.

bluGill|1 year ago

They are not and never did commit generated files (as far as I can tell). Their release process used to generate some files and place that into a distribution file, but that file was never committed anywhere.

miki123211|1 year ago

The same applies to refactorings unfortunately.

If you make a large but simple refactoring, like renaming a frequently-used function across a large repo, nobody is going to audit that diff and check for extra changes.

Things don't have to be this way, Google's source control systems apparently has tools that can do such refactorings for you in a centralized fashion, and one could make something like that for git.

prpl|1 year ago

Going to the extreme of this though, I really really hate getting an autoconf project with no generated configure file. I don’t want to install the full autotools suite to do build!

On the other hand, keeping tarballs close to the git tree makes it easy to reuse git archive and related GitHub features, provided the repo properly includes some kind of versioning information in tree.

vbezhenar|1 year ago

Linux software sources are in a weird spot between users and developers.

I, as a developer, organize sources in a way that make it easy to work for another developer. My software will never be compiled by any user. All my users use build artifacts.

I might consider adding autogenerated code, but only when I'm like 99% sure that this code won't ever change. For example that's the case for integration with many organizations where WSDLs are agreed upon once and then never touched. Having Java sources regenerated every build just adds few seconds to every build time without noticeable advantages.

The fact that some Linux users prefer to build software from the sources and at the same time do not want to install necessary build tools is a bit strange situation.

May be containers should be better utilized for this workflow. Like developer supplies Dockerfile which builds a software and then copies it to some directory. You're running `docker build .` and they copying binary files from the container to the host.

jeltz|1 year ago

PostgreSQL also supports Meson which requires no generated filed to be convenient.

cryptonector|1 year ago

Including autoconf outputs servers to avoid having to have autoconf installed. Because autoconf installs historically lagged behind what autoconf-using projects wanted, this used to be a problem. Nowadays it's not that big a deal.

As u/nrabulinski says, you can have the CI system generate and commit (with signed commits) autoconf artifacts.

EuAndreh|1 year ago

> Nowadays it's not that big a deal.

The same can be said about autotools itself :/

Historical and current use indeed vary, and many times even using autotools itself isn't as appropriate.

malkia|1 year ago

Generated outputs, especially when source code (headers, etc.) are important to keep for debugging later.