If you're working at a large company and downtime is extremely expensive, this checklist is a good guide. Otherwise, if you have good test coverage, you can get by with something simpler. It's super rare to have a breaking change in go.
We do quarterly upgrades of all services in a monorepo (about 20-30). The steps are basically this:
- Upgrade all dependencies to their latest versions, fixing build and test breaks (I read release notes for Go, but not for dependencies)
- Look for deprecated packages and replace them
- Upgrade all toolchains, including CI/CD containers, go.mod, etc.
- Run all tests
- Deploy to the test environment and make sure everything is green
- Deploy to staging and do some sanity checks
- Deploy to prod, keeping an eye on metrics for an hour or two
We're on k8s and the state of all clusters (i.e. which images are running) is tracked in git, so a rollback is just git revert + apply.
In practice, after about four years of this, we've seen maybe a dozen build breaks, and I can only remember one regression caused by a breaking change in a library[1].
Quarterly upgrades, four years: 16 upgrades. A dozen build breaks means that 75% upgrades face a build break.
Since it's over the total number of builds for 20-30 services, it should not be that bad; instead, sometimes there happened a completely uneventful upgrade of everything!
Out of curiosity, were you dealing with microservices defined within a monorepo, or microservices each in their own repo? The steps here:
> Build your binaries with the new version. Go through the build errors if any.
> Run all the unit tests with the new version. Go through the test failures.
are a lot easier in a monorepo.
Separately, I've experienced frequent breaking changes in the golangci-lint configuration file. I can't point to a specific instance of this happening but one thing I'd suggest is pinning your version of golangci-lint in development and in CI rather than using "latest".
Golang's backwards compatibility and simplified toolchain is one of my favorite parts about it. Bumping go.mod and downloading the new version of go is usually all it takes!
Not Hakan, but I was working closely with him at the time. Lyft was on a microservice many-repo setup, and we did pin the version of golangci-lint.
I've found it's actually not so bad to do this kind of work across many repos, as long as you have the tooling to apply the same change to any number of codebases all at once. Our strategy was typically:
- Write an idempotent codemod to do an upgrade. This is easy as long as your configuration is in a declarative language.
- Regularly apply it or update it on all of the applicable repos.
- Merge upgrades incrementally until you've upgraded 100%.
I’ll add an item that is not yet on our checklist but has already bitten us several times: check your code generation. Since code generation is so popular in the Go ecosystem, we’ve got 5 or 6 different codegen tools that update on various timelines. Twice now we’ve gone through a checklist similar to this article, patted ourselves on the back, and a week later found out no one can regenerate any code.
This is one reason why code generation should run as part of the build process, every time. Even if you decide to check-in the generated code for visibility.
Another suggestion: if your monorepo's service packaging is sufficiently uniform, build every service against both Go versions, package both binaries into the deploy artifact, and install a feature flag that lets you select which binary to boot when the service starts. This also lets you canary an arbitrary percentage of the fleet with the new Go version, and you can execute a version rollback by redeploying (without needing to revert any commits).
Currently (Go 1.24), the official team has not published a tool to identify all of the breaking cases caused by this change. So you might need to check the code by your eyes.
I'm not sure you can actually break code with the new for-loop semantics (I mean, in real life situations). It can probably fix some buggy code in the wild, but I have a hard time believing anyone would voluntarily write code relying on the old semantics of loop variables being reassigned instead of reinitialized.
> With the introduction of generics at 1.18, many linters lacked support for generics for months. We delayed the upgrade due to this issue.
I wouldn't plan on using a new feature in production in the release that introduced it. Why would you plan to be using generics on day one?
> There was talk of trying to solve this issue in the upstream ourselves.
Was there a genuine business case that would make Lyft more profit if they used generics? If not then why would you even consider this?
> Fortunately, by the time we seriously started exploring this option, linter support was added and go 1.19 was also released. We eventually upgraded directly to 1.19 from 1.17 but we were around 10 months late.
You weren't late. You were precisely on time. This is some odd project mentality.
physicles|1 year ago
We do quarterly upgrades of all services in a monorepo (about 20-30). The steps are basically this:
- Upgrade all dependencies to their latest versions, fixing build and test breaks (I read release notes for Go, but not for dependencies)
- Look for deprecated packages and replace them
- Upgrade all toolchains, including CI/CD containers, go.mod, etc.
- Run all tests
- Deploy to the test environment and make sure everything is green
- Deploy to staging and do some sanity checks
- Deploy to prod, keeping an eye on metrics for an hour or two
We're on k8s and the state of all clusters (i.e. which images are running) is tracked in git, so a rollback is just git revert + apply.
In practice, after about four years of this, we've seen maybe a dozen build breaks, and I can only remember one regression caused by a breaking change in a library[1].
[1] https://github.com/golang/go/issues/24211
nine_k|1 year ago
Since it's over the total number of builds for 20-30 services, it should not be that bad; instead, sometimes there happened a completely uneventful upgrade of everything!
peterldowns|1 year ago
> Build your binaries with the new version. Go through the build errors if any.
> Run all the unit tests with the new version. Go through the test failures.
are a lot easier in a monorepo.
Separately, I've experienced frequent breaking changes in the golangci-lint configuration file. I can't point to a specific instance of this happening but one thing I'd suggest is pinning your version of golangci-lint in development and in CI rather than using "latest".
Golang's backwards compatibility and simplified toolchain is one of my favorite parts about it. Bumping go.mod and downloading the new version of go is usually all it takes!
crockeo|1 year ago
I've found it's actually not so bad to do this kind of work across many repos, as long as you have the tooling to apply the same change to any number of codebases all at once. Our strategy was typically:
- Write an idempotent codemod to do an upgrade. This is easy as long as your configuration is in a declarative language.
- Regularly apply it or update it on all of the applicable repos.
- Merge upgrades incrementally until you've upgraded 100%.
mseepgood|1 year ago
Doesn't it auto-download when you bump go.mod nowadays?
et1337|1 year ago
jzwinck|1 year ago
lopkeny12ko|1 year ago
tapirl|1 year ago
Be careful after you did this. Go has changed for-loop semantics since Go 1.22. When you change the go version to 1.22+ from 1.22-, you Go code has a probability to being broken: https://go101.org/blog/2024-03-01-for-loop-semantic-changes-... (It is long. A short important summary is here: https://github.com/golang/go/issues/66156)
Currently (Go 1.24), the official team has not published a tool to identify all of the breaking cases caused by this change. So you might need to check the code by your eyes.
thiht|1 year ago
Honestly I think this is a non-issue.
timewizard|1 year ago
I wouldn't plan on using a new feature in production in the release that introduced it. Why would you plan to be using generics on day one?
> There was talk of trying to solve this issue in the upstream ourselves.
Was there a genuine business case that would make Lyft more profit if they used generics? If not then why would you even consider this?
> Fortunately, by the time we seriously started exploring this option, linter support was added and go 1.19 was also released. We eventually upgraded directly to 1.19 from 1.17 but we were around 10 months late.
You weren't late. You were precisely on time. This is some odd project mentality.