top | item 38581871

(no title)

barefeg | 2 years ago

Could you give more details/resources on the distributed in cluster buildkit cache?

discuss

order

SOLAR_FIELDS|2 years ago

We use Dagger’s implementation. The basic approach is to have a buildkit engine run as a daemonset on the cluster, and clients specify the same docker socket that buildkit uses. The magic is in cache synchronization, eg only lazily pulling layers as the client requests them. This is scalable but obviously since caching is hard there are some complexities with efficient synchronization of cache layers and cache volumes. This is currently a long lived service that runs as a deployment alongside a bunch of ephemeral runners to manage the cache synchronization.

There are several other different architectures that range from simpler to more complex. The architecture I recommend people start out with is a single long lived beefy buildkit instance that a bunch of runners share, since that is much much simpler to implement. It of course has the downside that you have to refresh/rebuild the cache if the instance ever goes down. For runs that need read/write locks on volumes (eg Gradle build cache) my recommendation after trial and error to rsync those to the runners and then rsync them back after the run completes so you don’t have a bunch of locks fighting each other for the same folder.