(no title)
ekimekim | 11 months ago
In my preferred model of on-call, you have a primary, then after 5min an escalation to secondary, then after 5min an escalation to something drastic (sometimes "everyone", sometimes a manager).
The expectation is that most of the time you should be able to respond within 5 minutes, but if you can't then that's what the secondary role is for - to catch you. This means it's perfectly acceptable to go for a run, go to a movie, etc.
You relax the responsibility on the individual and let a sensible amount of redundancy solve the problem instead. Everyone is less stressed, and sure you get the occasional 5min delay in response but I'm willing to bet that the overall MTTR is lower since people are well rested and happier to be on call to begin with.
Anon1096|11 months ago
jobs_throwaway|11 months ago
notnaut|11 months ago
closeparen|11 months ago
hylaride|11 months ago
You also need *ownership*. There is nothing worse than having to support somebody else's work and not being allowed (either via time or other restrictions) to do things "right" so that you're not always paged for fixable problems. Everywhere I worked where the techs had ownership (which varied from OPS people being allowed to override the backlog to fix issues or developers being given enough free reign to fix technical debt) has usually meant that oncall is barely an issue. My current gig I often forget I'm even on call at all and the main issues that do crop up are usually external.
happymellon|11 months ago
Things like, running in AWS but you have to use a custom K8S install so they aren't dependent on AWS.
Using self managed Kafka so that you aren't dependent on proprietary tech.
It all sucks because they are always less reliable and generate their own errors and noise for on-calls.
If they had to deal with phone calls every time there's a firewall issue that had absolutely nothing to do with the application, they would soon change their tune.
WhyIsItAlwaysHN|11 months ago
ekimekim|11 months ago
andrewaylett|11 months ago
If the primary (paid) on-call doesn't catch the notification, the secondary (unpaid) will be paged. And so on, down a couple more steps, to a senior manager. There's no expectation that anyone other than the primary would actually be available to ack the alert.
smitelli|11 months ago
inetknght|11 months ago
That's an unreasonable expectation unless it's clearly said in writing and is billable hours.