(no title)
xjia | 2 years ago
By spoofing I meant to say that an authenticated but malicious client (intentionally or not, e.g. a clueless intern) may be able to write malicious contents to the cache. For example, their build toolchain could be contaminated and the resulting build outputs are contaminated. The "action" per se and its hash is still legit, but the hash is only used as the lookup key -- their corresponding value is "spoofed."
The only safe way I can imagine to use such a remote cache is for CI to publish its build results so that they could be reused by developers. The direction from developers to developers or even to CI seems difficult to handle and has less value. But I might be missing some important insights here so my conclusion could be wrong.
But if that's the case, is the most valuable use case to just configure the CI to read from / write to the remote cache, and developers to only read from the remote cache? And given such an assumption, is it much easier to design/implememt a remote cache product?
sgammon|2 years ago
One goal of these tools is to guarantee that such misconfiguration results in a cache key mismatch, rather than a hit and a bug.
There are tons of challenges designing a remote build cache product, like anything, but that one has turned out to be a reliable truth.
Some other interesting insights:
- transmitting large objects is often not profitable, so we found that setting reasonable caps on what’s shared with the cache can be really effective for keeping transmissions small and hits fast
- deferring uploads is important because you can’t penalize individual devs for contributing to the cache, and not everybody has a fast upload link. making this part smooth is important so that everyone can benefit from every compile.
- build caching is ancient, Make does its own simple form of build caching, but the protocols for it vary in robustness greatly, from WebDAV in ccache to Bazel’s gRPC interface
- most GitHub Actions builds occur in a small physical area, so accelerating build artifacts is an easier problem than, say, full blown CDN serving
The assumptions that definitely help:
- it’s a cache, not a database; things can be missing, it doesn’t need strong consistency
- replication lag is okay because a build cache entry is typically not requested multiple times in a short window of time; the client that created it has it locally
- it’s much better to give a fast miss than a slow hit, since the compiler is quite fast
- it’s much better to give a fast miss than an error. You can NEVER break a build; at worst it should just not be accelerated.
It’s an interesting problem to work on for sure.