top | item 40642699

(no title)

mythhabit | 1 year ago

Some of the main benefits of shipping every day, is:

  1. Develop a system where you can turn codepaths on and off with a toggle.
  2. Become better at architecture. Learning to split a task into smaller chunks that are easier to reason about, both overall and individually.
  3. Learn to do multi phase increment.
  4. Develop an automatic deploy/rollback system.
All are good practices, and essential if you need HA. It also gets the velocity into your daily work, so if you need to deploy extra logging, a bugfix ect, you can do it in minuts and not hours/days/weeks.

Can you do all of that and ship on a weekly basis? Absolutely, I just haven't met anyone that do that.

discuss

order

chipdart|1 year ago

> Can you do all of that and ship on a weekly basis? Absolutely, I just haven't met anyone that do that.

I'd go as far as to claim that you cannot, at least at a level close to a highly available system updated with full CD without any gating or manual intervention.

Weekly deployments implicitly lead to merge trains and large PRs and team members sitting on their ass to approve changes. Each deployment is huge and introduces larger changes which have a larger risk profile. As deployments are a rare occurrence, teams don't have incentives to invest in automated tests or mechanisms to revert changes, which leads you to a brittle system and pressure to be conservative with the changes you do. As deployments are rare, breaking deployments becomes a major issue and a source of pressure.

To understand this, we only need to think about the work that it takes to support frequent deployments. You need to catch breaking changes early, so you feel the need to increase test coverage and invest in TDD or TDD-like development approaches. You also feel the need to have sandbox deployments available and easy to pull off. You feel the need to gate deployments between stages if any automated test set breaks. You feel the need to improve testing so that prod does not break as often. If prod breaks often,you also feel the need to automate how deployments are rolled back and changes are reverted. You also feel the need to improve troubleshooting and observability and alarming to know if and hoe things break, and improve development workflows and testing to the ensure those failures don't happen again.

You get none of this if your project deploys at best 3 or 4 times a month.

Another problem that infrequent deployments causes is team impedance. You need more meetings to prepare for things which would otherwise be automated away. You have meetings for deployments for specific non-prod stages, you have meetings to plan rollback strategies, you have meetings to keep changelogs, you have meetings to finalize release versions, you have meetings to discuss if a change should go on this or that release cycle, etc. bullshit all around.

Arainach|1 year ago

If your tests only run as part of deploy, you've already lost.

So long as you have a good set of integration tests and the right staging environments to run those tests against, when you ship doesn't matter. I've worked on multiple teams with test suites that give very high confidence what changes are good and are not; most tests were fast and could run before code was merged and block submission. The expensive integration tests ran against a staging environment, and when they started failing it was very quick to identify the range of CLs there and understand what to rollback/fix.

For most of my time there those services only pushed to prod twice a week - sometimes less if the integration tests were failing on the day of one of the deploys. Not every day, not every commit. And yet we had all of those benefits that you claim are impossible. No list of idle people waiting to approve changes, no "huge" deployments, infrastructure for automated tests and more. There are no meetings - those two weekly rollouts are entirely automated infrastructure, and unless an abnormal error rate is detected during the canary proceed entirely without human involvement.

The world didn't fall down. Customers didn't ask us where things were. Oncall appreciated that there weren't constant builds at risk of breaking and falling over - they only had to monitor rollouts twice a week and otherwise could focus on failing tests, logs, and alerts.

zarathustreal|1 year ago

> 1. Develop a system where you can turn codepaths on and off with a toggle.

> All are good practices…

I beg to differ on this point actually. It’s very difficult to get right and leads to subtle bugs (or even potential security vulnerabilities). It also pollutes the codebase with conditional statements that aren’t related to the business logic and makes it harder to read. Avoid feature flags.