(no title)
amluto | 2 days ago
I’m afraid you’re missing my point, though. A high quality build system takes fixed inputs and produces outputs that are, to the extent possible, only a function of the inputs. If there’s a separate process that downloads the inputs (and preferably makes sure they are bitwise identical to what is expected), fine, but that step should be strictly outside the inputs to the actual thing that produces the release artifact. Think of it as:
artifact = build_process(inputs)
inputs = fetch(credentials, cache, hashes, etc)
Or, even better perhaps: inputs = …
assert hash(inputs) == expected
(And now, unless you accidentally hash your credentials into the expected hash, you can’t leak credentials into the output!)Once you have commingled it so that it looks like:
final output, intermediate layers = monolithic_mess(credentials, cache, etc)
Then you completely lose track of which parts are deterministic, what lives in the intermediate layers, where the credentials go, etc.Docker build is not a good build system, and it strongly encourages users to do this the wrong way, and there are many, many things wrong with it, and only one of those things is that the intermediate layers that you might think of as a cache are also exposed as part of the output.
stevenhuang|23 hours ago
Hence my confusion of what you meant -- no one's saying ssh keys are in the CI build artifacts. But obviously they can be in the container as layers if people do it wrong, which is bad.
We're talking about the same thing basically. Yes fully defining your inputs to the container by passing in the keys is a good solution.
amluto|13 hours ago
> the container is also a build artifact
By "build artifact" I mean the data that is the output of the build and get distributed to other machines (or run locally perhaps). So a build artifact can be a tarball, an OCI image [0], etc. But calling a container a build artifact is really quite strange. A "container" is generally taken to mean the thing you might see in the output of 'docker container ls' or similar -- they're a whole pile of state including a filesystem, a bunch of volume mounts, and some running processes if they're not stopped. You don't distribute containers to other machines [1].
> in context of CI, the output of running the build using the container
I have no idea what you mean. What container? CI doesn't necessarily involve containers at all.
> no one's saying ssh keys are in the CI build artifacts. But obviously they can be in the container as layers if people do it wrong, which is bad.
If the build artifact is an image, and the keys are in the image, then the keys are in the build artifact.
> Yes fully defining your inputs to the container by passing in the keys is a good solution.
Are you suggesting doing a build by an incantation like:
This is IMO a terrible idea. A good build system DOES NOT PROVIDE KEYS TO THE BUILD PROCESS.Yes, I realize that almost everyone fudges this because we have lots of tools that make it easy. Even really modern stuff like uv does this.
whoops, that uses optional credentials, fetches (hopefully locked-by-hash) dependencies, and builds. It's convenient for development. But for a production build, this would be much better if it was cleanly split into a fetch-the-dependencies step and a build step and the build step ran without network access or any sort of credentials.[0] https://specs.opencontainers.org/image-spec/
[1] A build artifact could be a container snapshot, but that's different.