(no title)
palcu | 2 years ago
Personally, I wasn't part this time for the actual mitigation of the overall Paris DC recovery, as I was busy with an unfortunate[0] side effect of the outage. These generate more anxiety, as being woken up at 6am and being told that nobody understands exactly why the system is acting this way is not great. But then again, we're trained for this situation and there are always at least several ways of fixing the issue.
Finally, it's worth repeating that incident management is just a part of the SRE job and after several years I've understood that it is not the most important one. The best SREs I know are not great when it comes to a huge incident. But, they're work has avoided the other 99 outages that could have appeared on the front page of Hacker News.
Waterluvian|2 years ago
throwbigdata|2 years ago
palcu|2 years ago
[0]: https://news.ycombinator.com/item?id=32161755