(no title)
pommers | 7 years ago
The places it falls down are where we interface with other teams who aren't on call for their systems and for them a weekend long outage is "acceptable".
pommers | 7 years ago
The places it falls down are where we interface with other teams who aren't on call for their systems and for them a weekend long outage is "acceptable".
wikibob|7 years ago
I suggest you look at the on-call chapters in the SRE book, SRE Workbook, and Seeking SRE.
The solution is primarily to include the development team in the on-call rotation (you build it - you run it). This can be very hard to do politically.
michaelt|7 years ago
pommers|7 years ago
We're hiring people with on call being something that is part of the position they are taking.
As for the other teams, we're working on the politics to get them to support there systems, and looking at alternatives to using them if they don't.