top | item 37121689

(no title)

kawemi | 2 years ago

It's a really useful perspective in real-life scenarios when you're not developping critical software. Of course a baseline of risk-avoidance is always important, but businesses/custommers/users most of the time are ready to handle some risks, like downtime, bugs, delays, etc. SWE and developpers are the more risk-averse of the two parties, which leads to us over-valuing the importance of robustness and stability.

For example, it's way easier/faster to implement observability and some sort of rollback of bad versions than to try and prevent every possible way an app could crash and trigger a bunch of problems. What's going to happen if the app crash is pretty simple : customers will be mad (CS/Marketing/PR can handle them), you'll notice the downtime quickly and rollback (or maybe even rollback automatically!). Then you'll be in a perfect position to handle what went wrong : systems will be back on a known stable position and all the stress of trying to fix something in a live production system will be gone.

discuss

johnmaguire|2 years ago

Of course, there is no magic bullet. Some problems aren't solved by rolling back services. (e.g. A thundering herd of clients caused by re-deploying an old build overloading your database.)

kawemi|2 years ago

Yes of course, my fake situation was assuming a pretty boring case of failure with an easy out (rollback). The underlying principle is that most of the time trying to preempt every situation is way more work than being conscious of them and giving yourself and your team(s) reasonable tools to mitigate them :)