top | item 33634335

(no title)

p0rkbelly | 3 years ago

If your code breaks something, you should fix that code. Who else should?

If your system/product/service is down because you have a dependency on something that broke -- well it's up to that team to fix their code.

discuss

order

I_AM_A_SMURF|3 years ago

Ideally incident handling should "just" be rolling back the broken change. Fixing the problem should be done in the morning with no time pressure, not in the middle of the night half asleep with customers on the other side of the world yelling at you. Of course it's not always that simple, but most of the time that's what on call should be about

p0rkbelly|3 years ago

It would be nice if things only broke during "business" hours and didn't have real world impact. Nevermind impact millions of people around the world. But if you look at the customers of say code that is running cloud infrastructure it is running airlines reservations/checkins, government workloads, banks, hospitals, critical infrastructure, netflix, gaming services. That's a lot of things that can't typically wait for morning.

dimmke|3 years ago

This is the pat answer Amazon gives to defend this absurd practice, but it breaks down really easily.

>If your code breaks something, you should fix that code. Who else should?

What if it wasn't my code, but code written by someone 3 years ago who quit because most people only work at the company for 2 years? And it's in a part of the codebase I've never touched. That's a much more likely scenario.

nevon|3 years ago

That's still your code ("your" meaning the team that owns the product). Who else would own it? The person that left 3 years ago?