top | item 21587641

(no title)

dkoston | 6 years ago

Thanks, I was hoping for more of this in the blog post. Since tools are just an expression of process/policy, it’s more interesting to here about the process and why than it is about building “yet another CD tool”. Appreciate the thoughtful and thorough response.

The major pain point I agree with on develop is changing the defaults to merge to that rather than master. It’s a shame this is not easier to do in git/github.

I’m not sure I agree with “develop can still be broken” as an issue that supports a queue. Whether it’s a queue or develop, one should run CI on each change to validate that merging it to master will not cause issues. It’s possible for both to be broken via the same scenarios just as it’s possible for master to be broken. Since CI runs before the branch is merged to develop and upon merge, a failure would “stop the world” and prevent more code from being merged unless that code fixes the failure.

I guess I’m not fully understanding how a queue prevents this. Since you don’t have a full picture of the state of master until something is merged from the queue, how do the CI checks in the queue prevent things that branch-based CI checks wouldn’t prevent in a “develop” branch? With branches and develop, pull requests remain open until they can be assured they merge properly with develop as well.

For clarity, I’m not arguing that a develop branch is the way to go, I think CD is much better.

Maybe I’m missing something big here but using multiple branches is permissible in other setups also. You can cherry pick a bunch of commits to a branch and test permutations but only certain branches get deployed to staging and production based on rules.

I’m glad that Shopify has found tools and a process that works. Honestly, I’m just having trouble comparing and constraining this to the other tools that are out there. The article never speaks about other approaches and whether or not they were considered and why you decided to go with a queue. It’s not clear to me if this was a case of improving the existing queue system because it was already in place or whether or not the queue was specifically chosen again because it was better than other alternatives (and why).

discuss

wvanbergen|6 years ago

> I guess I’m not fully understanding how a queue prevents this. Since you don’t have a full picture of the state of master until something is merged from the queue, how do the CI checks in the queue prevent things that branch-based CI checks wouldn’t prevent in a “develop” branch? With branches and develop, pull requests remain open until they can be assured they merge properly with develop as well.

The trick of the merge queue is that it splits the "merging a branch / pull request" in two steps:

1. Create a merge commit with master and your PR branch as ancestors.

2. Update the `master` ref to point to the merge commit.

Normally when you press the "Merge Pull Request" button, it will do those two things in one go. By splitting it up in two distinct steps, we can run CI between step 1 and 2, and only fast-forward master if CI is green.

This means that master only ever gets forwarded to green commits. And because the sha doesn't change during a fast-forward, all the CI statuses are retained. Only when we fast-forward will GitHub consider to pull request merged, so we don't have to "undo" pull request merges when they fail to integrate. If the merge commit fails to build successfully, we leave a comment on the PR that merging failed, and the PR is still open.

When we have multiple PRs in the queue, we can create merge commit on top of merge commit, and run CI on those merge commits. When once of these CI runs comes back, we can fast forward master to it, and potentially merge multiple pull requests at once with this approach.

dkoston|6 years ago

I think I see where you are coming from. Being as we use different tools, we wouldn’t allow a pull to be merged if it wasn’t up-to-date with master which is similar but a different approach. You’ll have to check at merge time because getting up-to-date could take a while and master could have changed. Jenkins does this and it can be done in other CI/CD systems with a bit of custom code.

I’d imagine at 1,000 developers and with a monolithic codebase, you’re looking to minimize test runs both from a time and cost of runners perspective.

You may also want to look into Zuul or Bazel if cost of test suite runs is a factor in coming to this solution.