top | item 42485121

(no title)

I think unfortunately the conclusion here is a bit backwards; de-risking deployments by improving testing and organisational properties is important, but is not the only approach that works.

The author notes that there appears to be a fixed number of changes per deployment and that it is hard to increase - I think the 'Reversie Thinkie' here (as the author puts it) is actually to decrease the number of changes per deployment.

The reason those meetings exist is because of risk! The more changes in a deployment, the higher the risk that one of them is going to introduce a bug or operational issue. By deploying small changes often, you get deliver value much sooner and fail smaller.

Combine this with techniques such as canarying and gradual rollout, and you enter a world where deployments are no longer flipping a switch and either breaking or not breaking - you get to turn outages into degradations.

This approach is corroborated by the DORA research[0], and covered well in Accelerate[1]. It also features centrally in The Phoenix Project[2] and its spiritual ancestor, The Goal[3].

[0] https://dora.dev/

[1] https://www.amazon.co.uk/Accelerate-Software-Performing-Tech...

[2] https://www.amazon.co.uk/Phoenix-Project-Helping-Business-An...

[3] https://www.amazon.co.uk/Goal-Process-Ongoing-Improvement/dp...

discuss

motorest|1 year ago

> The reason those meetings exist is because of risk! The more changes in a deployment, the higher the risk that one of them is going to introduce a bug or operational issue.

Having worked on projects that were perfectly full CD and also projects that had biweekly releases with meetings with release engineers, I can state with full confidence that risk management is correlated but an indirect and secondary factor.

The main factor is quite clearly how much time and resources an organization invests in automated testing. If an organization has the misfortune of having test engineers who lack the technical background to do automation, they risk never breaking free of these meetings.

The reason why organizations need release meetings is that they lack the infrastructure to test deployments before and after rollouts, and they lack the infrastructure to roll back changes that fail once deployed. So they make up this lack of investment by adding all these ad-hoc manual checks to compensate for lack of automated checks. If QA teams lack any technical skills, they will push for manual processes as self-preservation.

To make matters worse, there is also the propensity to pretend that having to go through these meetings is a sign of excellence and best practices, because if you're paid to mitigate a problem obviously you have absolutely no incentive to fix it. If a bug leaks into production, that's a problem introduced by the developer that wasn't caught by QAs because reasons. If the organization has automated tests, it's even hard to not catch it at the PR level.

Meetings exist not because of risk, but because organizations employ a subset of roles that require risk to justify their existence and lack skills to mitigate it. If a team organizes it's efforts to add the bare minimum checks to verify a change runs and works once deployed, and can automatically roll back if it doesn't, you do not need meetings anymore.

vegetablepotpie|1 year ago

This is very well said and succinctly summarizes my frustrations with QA. My experience has been that non-technical staff in technical organizations create meetings to justify their existence. I’m curious if you have advice on how to shift non-technical QA towards adopting automated testing and fewer meetings.

gavmor|1 year ago

> The main factor is quite clearly how much time and resources an organization invests in automated testing.

For context, I think it's worth reflecting on Beck's background, eg as the author of XP Explained. I suspect he's taking even TDD for granted, and optimizing what's left. I think even the name of his new blog—"Tidy First"—is in reaction to a saturation, in his milieu, of the imperative to "Test First".

sourceless|1 year ago

I think we may be violently agreeing - I certainly agree with everything you have said here.

unknown|1 year ago

[deleted]

unknown|1 year ago

[deleted]

tomxor|1 year ago

I tend to agree. Whenever I've removed artificial technical friction, or made a fundamental change to an approach, the processes that grew around them tend to evaporate, and not be replaced. I think many of these processes are a rational albeit non-technical response to making the best of a bad situation in the absence of a more fundamental solution.

But that doesn't mean they are entirely harmless. I've come across some scenarios where the people driving decisions continued to reach for human processes as the solution rather than a workaround, for both new projects and projects designated specifically to remove existing inefficiencies. They either lacked the technical imagination, or were too stuck in the existing framing of the problem, and this is where people who do have that imagination need to speak up and point out that human processes need to be minimised with technical changes where possible. Not all human processes can be obviated through technical changes, but we don't want to spread ourselves thin on unnecessary ones.

lifeisstillgood|1 year ago

So this seems quantifiable as well - there must be a number of processes / components that a business is made up of, and those presumably are also weighted (payment processing has weight 100, HR holiday requests weight 5 etc).

I would conjecture that changing more than 2% of processes in any given period is “too much” - but one can certainly adjust that.

And I suspect that this modifies based on area (ie the payment processing code has a different team than the HR code) - so it would be sensible to rotate releases (or possibly teams) - this period this team is working on the hard stuff, but once that goes live the team is rotated back out to tackle easier stuff - either payment processing or HR

The same principle applies to attacking a trench, moving battalions forward and combined arms operations.

Now that is of course a “management” problem - but one can easily see how to automate a lot of it - and how other “sensory” inputs are useful (ie which teams have committed code to these sensitive modules recently

One last point is it makes nonsense of “sprints” in Agile/Scrum - we know you cannot sprint a whole marathon, so how do you prepare the sprints for rotation?

gavmor|1 year ago

There are no sprints in agile. ;)

On the contrary, per the Manifesto:

> Agile processes promote sustainable development.

> The sponsors, developers, and users should be able to maintain a constant pace indefinitely.

ozim|1 year ago

I am really interested in organizations capacity of soaking the changes.

I live in B2B SaaS space and as much as development goes we could release daily. But on the receiving side we get pushback. Of course there can be feature flags but then it would cause “not enabled feature backlog”.

In the end features are mostly consumed by people and people need training on the changes.

ajmurmann|1 year ago

I think that really depends on the product. I worked on a on-prem data product for years and it was crucial to document all changes well and give customers time to prepare. OTOH I also worked on a home inspection app and there users gave us pushback on training because the app was seen as intuitive

vasco|1 year ago

I agree entirely - I use the same references, I just think it's bordering on sacrilege what you did to Mr. Goldratt. He has been writing about flow and translating the Toyota Production System principles and applying physics to business processes way before someone decided to write The Phoenix Project.

I loved the Phoenix Project don't get me wrong, but compared to The Goal it's a like a cheaply produced adaptation of a "real" book so that people in the IT industry don't get scared when they read about production lines and run away saying "but I'm a PrOgrAmmEr, and creATIVE woRK can't be OPtiMizEd like a FactOry".

So The Phoenix Project if anything is the spiritual successor to The Goal, not the other way around.

grncdr|1 year ago

That’s exactly what the GP wrote: The Goal is the spiritual ancestor of The Phoenix Project.

ricardobeat|1 year ago

> By deploying small changes often, you get deliver value much sooner and fail smaller.

Which increases the number of changes per deployment, feeding the overhead cycle.

He is describing an emergent pattern here, not something that requires intentional culture change (like writing smaller changes). You’re not disagreeing but paraphrasing the article’s conclusion:

> or the harder way, by increasing the number of changes per deployment (better tests, better monitoring, better isolation between elements, better social relationships on the team)

sourceless|1 year ago

I am disagreeing with the conclusion of the article, and asserting that more and smaller deployments are the better way to go.

manvillej|1 year ago

this isn't even a software things. Its any production process. The greater amount of work in progress items, the longer the work in progress items, the greater risk, the greater amount of work. Shrink the batch, shorten the release window window.

It infuriates me that software engineering has had to rediscover these facts when the Toyota production system was developed between 1948-1975 and knew all these things 50 years ago.