top | item 24892094

(no title)

When I first joined one of my previous jobs, the build process had a checkout stage where it was blowing away the git folder and checked out from scratch the whole repo every time (!). Since the build machine was reserved for that build job I simply made some changes to do git clean -dfx & git reset --hard & git checkout origin branch. It shaved off like 15 minutes of the build time, which was something like 50% of the total build time.

discuss

mikepurvis|5 years ago

It's frustrating how many ways there are for a git clone to get out of sync, especially when it's an automation-managed one that is supposed to be long-lived (think stuff like gracefully handling force-pushed branches and tags that are deleted). I've dealt with a bit of this with my company's Hound (code search engine) instance. Currently there's a big snarl of fallback logic in there that tries a shallow clone, but then unshallows and pulls refs if it can't find what it's looking for, culminating in this ridiculousness:

    git fetch --prune --no-tags --depth 1 origin+{ref}:remotes/origin/{ref}

See the whole thing here: https://github.com/mikepurvis/hound/blob/6b0b44db489f9aeff39...

The pipeline I manage is many repos rather than a monorepo, and maintaining long-licheckouts in this context is not really realistic, but what does work and is very fast is just grabbing tarballs— GitLab and Github both cache them, so they don't don't cost additional compute after the first time, and downloading them is strictly less transfer and fewer round trips than the git protocol.

The only real cost is that anything at build time which needs VCS info (eg, to embed it in the binary) will need an alternate path, for example having it be able to be passed in via an envvar.

mixmastamyk|5 years ago

A new checkout is good practice. Using refspec and depth options can make it quick.