top | item 25843210

Fulfilling the Promise of CI/CD

71 points| kiyanwang | 5 years ago |stackoverflow.blog | reply

77 comments

order
[+] numlocked|5 years ago|reply
“ If any engineer on your team can merge a set of changes to main, secure in the knowledge that 15 minutes later their changes will have been tested and deployed to production, with no human gates or manual intervention required, then congratulations! You’re doing CI/CD.”

We absolutely do this and I don’t think we are THAT unusual.

Also, as you hit larger scale it’s questionable whether this is still a good idea. We are about 50 engineers and our CD means we release to production about 10 times a day, on average. At that pace, some tricky things start to happen: a metric starts to move sideways and it’s hard to actually correlate it to a release - you may have to roll back across multiple releases, which can create a fairly confusing situation for the team.

[+] dunreith|5 years ago|reply
I suppose there's an even larger scale that the law of large numbers applies and you can more reasonably filter out "the noise"
[+] jasonpeacock|5 years ago|reply
If you want to get more into the details about the benefits of CI/CD and other best practices, Accelerate[0] is a great book.

It's especially helpful for getting the data and forming the arguments you need to persuade leadership that CI/CD is a good thing.

[0] https://smile.amazon.com/Accelerate-Software-Performing-Tech...

[+] tenaciousDaniel|5 years ago|reply
Can confirm, they really took a legitimate scientific approach and years of research in order to arrive at their results. Excellent book.
[+] necovek|5 years ago|reply
I've been at teams and on projects doing CI and CD for the last... 15 years.

Sure, our deployment process was 10-15 minutes in the best of cases (because of serializing deployments for different branches), but where does the author's impression that the collective "we" are not doing CD come from?

[+] mborch|5 years ago|reply
"Very few teams are actually practicing CI/CD" is mentioned in the article.

In an enterprise setting, due to segregation of duties, it's often true that CD is impossible due to restrictive change management processes that result in unpredictable and typically week-long cycles.

[+] austincheney|5 years ago|reply
Deployment means shipping working code, but in my professional experience most people conflate that to publication. Publication is about making something available to an audience, whether internal or external, which is much more than just adding a functional piece to another place. This is just the start of getting a deployment process terribly wrong, but everything else wrong seems to result from this.
[+] tenaciousDaniel|5 years ago|reply
A 10-15 minute deployment process sounds very good, if you take the Accelerate research as a baseline.
[+] mrdonbrown|5 years ago|reply
As my startup [1] is in the domain of CI/CD, I've been doing a bunch of customer development interviews to better understand how teams deliver software. I was also surprised how few teams use full Continuous Delivery, even at cutting edge tech companies. It is indeed often used by small teams, even within large companies, where they deliver internal backend services.

The most common seems to be auto-deployed to staging or a dev environment, with some sort of daily or weekly process for promoting to production. One company built a Slack-based approval process using +1 or -1 reactions, and another has a zoom meeting where every author has to attend and is walked through a checklist before the release is approved.

My team also had a manual approval step to production, which theoretically meant the dev would check logs, dashboards, and alerts before approving, but in practice both with us and teams I talked to, that is followed about 50%-80% of the time.

What we built into our product, Sleuth [1], is a way to automatically promote staging releases when the staging release was determined to be healthy and soaked for a minimum amount of time. This allows the 80% case to simply flow through to prod without developer babysitting, whereas we can easily interrupt the process with a -1 reaction in Slack if it needs more manual testing. I think this is the ideal - the common case is the code flows but you still have an easy way to interrupt the process when the change needs it.

[1] https://sleuth.io

[+] marcosdumay|5 years ago|reply
All those years it has been on focus, I still don't understand what problem CD is expected to solve.

Yes, there is value on CI, and there is plenty of value on very low effort deployment. But, except from a large downwards risk, what does automatic deployment bring in exactly?

[+] hctaw|5 years ago|reply
I disagree with some of the conclusions.

CD is avoided because most software products should not be delivered continuously, as it is inversely correlated to software stability for customers. It may create value internally, but if it doesn't create value externally and is cheap to roll out then it's probably not worth it.

Slightly related, I think CI/CD is today where VCS was before git. It all kinda sucks, and no startup or company is going to fix it (nor can they, it must be free as in beer and speech). We need something that sucks less and becomes as standard a tool as git or make.

[+] choeger|5 years ago|reply
Good point. Imagine the developers of Postgres or Mongo would do CD to your database.

IMO, the idea of CD only works for companies that control the production environment and basically offer a service, not a product. If that's you, you may get a great deal out of CD.

But if you have a product and if that product manages your customer's data, is integrated with your customer's scripts, and runs on an environment that you don't completely control, you better think twice before permanently updating the system. You can still do it, but it will be expensive to get it right.

[+] dunreith|5 years ago|reply
I led our team to a full CI/CD (Deployment) and while it felt really counterintuitive to move quickly and have the occasional bug show for our customers, the speed at which we could fix the bug has vastly offset it. Introduce a bug? Fix it same day. Introduce a breaking change? Rollback and fix it. No biggie.
[+] vincnetas|5 years ago|reply
Have you had data corruptions because of the bugs that could not be rolled back quickly?
[+] Macha|5 years ago|reply
2016 was the last time I was on a team I consider not to be doing CD, in the sense that you hit merge and it goes to prod without manual intervention if automated tests pass.

However, I'm sure some people would argue that some of these teams are not CD enough due to items like the following:

- One team had very long (4+ hour) test suites. End result, devs hit merge in the morning to see their changes that day, and didn't if a test failed (and of course there were flaky tests). Eventually there was a project to pare down unneeded tests and address flakiness, but by the time I left it was still a 2 hour (though more reliable) time from merge to deploy. No human intervention needed to.

- Release windows. The release pipeline on one team only ran 9-5 monday to thursday. If you merged after 5 it kicked off the following morning. If you merged friday, it kicked off monday morning. I'd accept a claim that the team was only doing CD 4 days of the week, but not doing CD at all is overstating it.

- Code reviews as requirements of merge. I still think this doesn't disqualify it, because the developer still has the option of when to hit merge after review and go to production without anyone else's intervention. On 2/3 teams the CI pipeline ran on branches also was sufficiently reliable that if your branch build passed, you were pretty sure the release build also would.

[+] k__|5 years ago|reply
Anyone got some good resources on CD?

I think, especially in the cloud, I'd would fear updating a database automatically and lose data. Some of the resources in AWS/CloudFormation are "replaced" and not "updated", which gives me a bit of paranoia.

[+] signal11|5 years ago|reply
I’d start with reading the Phoenix Project (a “novel” about Devops) and also Accelerate. Also the Continuous Delivery book by Dave Farley and Jez Humble.

Beyond that however it’s about taking a look at your processes and your particular challenges.

Say if you have a old-school DB with lots of stored procs, no or not enough tests, and that people tremble to update, that DB is a business risk.

You should have backups and contingency plans (eg a hot standby?) to ensure updates are resilient even if something fails. And of course the “good hygiene” of adding columns, using feature flags etc as the other commenter wrote.

Once you’re less worried about the database, you can start refactoring it if you wish.

Refactoring a large, entrenched database is a large topic but re publicly available case studies, have a look at Netflix’s case study on how they migrated their billing system away from Oracle[1].

[1] http://techblog.netflix.com/2016/06/netflix-billing-migratio...

[+] solumos|5 years ago|reply
Writing data migrations that fail safely is tricky, and requires some extra care. Temp tables, adding new columns instead of modifying, feature flags, etc. all help with this. Basically, you want to run stuff in production with less risk.

On the AWS/CloudFormation side, "Cattle, not pets" is the motto[0]. Terraform specifically is really good at capturing the desired vs current state of your infra, and showing you what those changes will be before you apply them. Point being - as long as your automation is captured in Infra-as-code, you shouldn't care too much if something gets destroyed.

[0]- http://cloudscaling.com/blog/cloud-computing/the-history-of-...

[1] - https://www.terraform.io/

[+] jeremy_k|5 years ago|reply
"Is it hard? Yes, it is hard. But I hope I’ve convinced you that it is worth doing. It is life changing. It is our bridge to the sociotechnical systems of tomorrow, and more of us need to make this leap. What is your plan for achieving CI/CD in 2021?"

At Release[0] we discussed this in our Build vs Buy[1] page, specifically in the section "Is building a PaaS your core competency?" Companies should be focused on building their product and delivering the value of that product to their customers, not spending years building out a platform to achieve CD.

My hope is that people's plan for achieving CI/CD in 2021 includes looking at all the companies working in this space and give them a chance rather than trying to spin something up on their own.

* If it wasn't clear, I work at Release, so my I'm letting my bias be known *

[0] https://releaseapp.io [1] https://releaseapp.io/build-vs-buy

[+] slumdev|5 years ago|reply
The "Delivery/Deployment" distinction is baseless and used as a justification for not shipping by people who are attached to old and burdensome change management processes.

Software not in the customer's hands hasn't been deployed OR delivered.

Alternatively: Software undeployed is software undelivered.

[+] signal11|5 years ago|reply
Note for people attached to old and cumbersome change management processes, eg ITIL:

ITIL’s latest iteration, ITIL v4, has a track for “high velocity IT” which incorporates rapid release cadence, CD, etc and is described by them as suitable for “digital” organisations or organisations going through digital transformation.

I laughed a bit at this because this is just ITIL playing catch-up, but still, it’s a useful data point to get some ‘stuck in the past’ people to see that the “old Enterprise IT ways” are no longer un-challengable.

[+] changemgmtproc|5 years ago|reply
I've always found the change management tooling backwards. Force the developer to aggregate the data so that auditors have an easier life when they check a single change once a year. Why doesn't the tooling just hook into the existing data sources that already provide the data for auditors. like PRs, testing data, JIRA tickets, etc...
[+] OliverGilan|5 years ago|reply
This article addresses an issue I currently have but doesn’t actually address how to fix it. I recently set up a simple homeserver to use as a testing ground for sideprojects before eventually porting them to a cloud service.

As a result I looked into building a CI/CD pipeline for the first time and practically every article I came across talking about “CI/CD” really just talked about CI.

Even today I have no idea how to easily automate deployments. The only service I know of that does this is Heroku. Ideally I should be able to push any changes to a master branch on Github and have those changes automatically deployed to my server. How this can be done is poorly documented from my experience and certainly nowhere near discussed as much as CI solutions.

[+] avgDev|5 years ago|reply
I implemented CI/CD pipeline for a project I am working on recently. I used Azure DevOps and the app is self-hosted on our servers. It took some time to figure out but I got it working after some trial and error.

Right now, it is triggered by a push to "master" branch, the pipeline builds the app, runs tests, creates artifacts(files to be used for release/deployment). Then, the release pipeline is triggered, this builds a "release", and deploys the application to "staging" environment used for user testing. Once, the application is approved, I have to go to Azure DevOps and confirm deployment to "Production". There are many different approaches, triggers and settings.

Check out Azure DevOps they have a lot of good information on CI/CD pipelines.

[+] akiselev|5 years ago|reply
In our production system, CircleCI has Github and AWS credentials. When CI is done building and testing the Docker image, it pushes it to a private container registry and updates the terraform infrastructure repo with the new container hash ID (either through a tfvar file or through env variables). Then another CircleCI job runs `terragrunt apply` in that repo to deploy to staging (prod/stage/dev are separate folders, automated jobs only update staging). Deploying from staging to prod is manual, by copying the container hash from staging to prod and pushing to master.
[+] jasonpeacock|5 years ago|reply
That's because building software & running tests is somewhat standardized, but deploying is very specific to each application's environment (and requires additional credentials, integration with monitoring, automated rollback, etc).

Some platforms provide this as part of their feature set (like AWS).

You talk about your side-projects - how do you deploy them today? Write a script for that and think about how many unique-to-yourself edge cases you are handling. If you believe your solution is generic and re-usable, then build that into a tool/platform for everyone and profit!

[+] kkapelon|5 years ago|reply
>As a result I looked into building a CI/CD pipeline for the first time and practically every article I came across talking about “CI/CD” really just talked about CI.

You don't mention what technology you are using, but maybe you didn't find the correct resources?

https://codefresh.io/docs/docs/yaml-examples/examples/#deplo...

F.D. I work for Codefresh

[+] the_gipsy|5 years ago|reply
In my hobby project I just have a script that pushes docker images and rsync the files (docker-compose.yml and some volume mounted things), and then sshes to the server and restarts docker-compose.
[+] alfonsodev|5 years ago|reply
where would you like to deploy? I've never had this problem you mention, but I knew how to deploy manually before using CI/CD, the problem if any is usually how do you translate what you are already doing to deploy manually to the CD format/language and mechanism.

EDIT: I subscribe what jasonpeacock is commenting above, I think it's better expressed in those words.

[+] dlor|5 years ago|reply
In my experience the opposite is true - no one is doing CI. But that's only because the definition of CI is unrealistic/impossible for large teams, specifically this requirement: https://en.wikipedia.org/wiki/Continuous_integration#Everyon...

I've never seen a team that operates with every developer committing every day. Small commits merged into a stable trunk as often as possible, yes. But every developer merging code every single day is unrealistic, counter-productive, and incompatible with code review practices.

[+] lhorie|5 years ago|reply
> Everyone commits to the baseline every day

That's not a hard requirement, it's more like a principle.

In my experience, it's easier to review smaller things than gigantic PRs. I don't take this line item to mean literally committing every chronological day, but in the sense of not withdrawing from the world and building out entire systems in a cave, so to speak. It's a more granular version of "prefer agile over waterfall".

Another subtle aspect of the commit-often mantra - particularly when it comes to a workflow with tests running in CI - is that you are encouraged to build software in a bottom up fashion (simply because if you try to "tack on" an incomplete feature to an existing live system, it'll obviously not work).

[+] jdbernard|5 years ago|reply
I'm actually having this discussion with a client at this very moment. Client is expecting that check-ins to baseline happens multiple times per day. On our distributed team with junior and senior people in different locations the ability to do async code reviews is critical to maintaining quality. Automated linting, unit testing, and other CD-friendly tools can't teach and enforce code quality the way we need with our junior developers.
[+] rattray|5 years ago|reply
The author seems to be asking for some things that just don't make sense.

My impression is that they claim that "true" CD means each deployment is of a single author's changes, and that batching defeats the spirit/purpose of CD.

But many orgs commit to master faster than deploys can go out (eg 15 minutes) especially if there is caution taken to ensure the change did not cause problems (automatically or manually detected).

I only skimmed parts; was there a solution to this problem mentioned? Or did I misunderstand?

[+] sergiotapia|5 years ago|reply
What strategies can a team implement for CD if deploys and merges to master take about an hour to deploy today?

Our problem is that we need to manually cut a release branch with the right version number manually then go through logistical steps before we can merge.

Docker alone takes about 30 minutes to build.

[+] choeger|5 years ago|reply
I suggest a pull-based approach, instead of gitlab/github's push based thing. Have a regular job pick the latest master branch, tag it correctly (watch out to order your releases), build, package, ship, deploy. I would keep it synchronous. Doesn't make sense to build the latest commit when deployment takes hours. Just wait for the last deployment to finish before starting again.

Also a pull based approach allows you to integrate multiple repos. I have never seen anyone from the CD camp talk about how a company should manage CD with n interdependent git repositories with the usual push based approach.

[+] xtracto|5 years ago|reply
I have always wondered how companies managed CI/CD infrastructure with PCI or SOC2 requirements, given that there have to be a lot of manual approvals and acknowledgements through the delivery process.
[+] telotortium|5 years ago|reply
Automate everything else but the approvals. Releases should still be built, tested, and pushed completely automatically, except that one of the release process steps is to present an "Approve" button to the release manager. After the release manager clicks the button, the rest of the release proceeds automatically.

Generally you would only need the manual approvals for prod. Dev, qa, staging, etc., can typically still be released completely automatically, so you just create a CI/CD infrastructure that can be used for both.