(no title)
quelltext | 1 year ago
I'm not sure we're all on the same page here but let me give you an example of how on-call essentially works on my team.
- Week long rotations spread out across the year among members.
- On-call means holding a pager but also taking in any non-urgent requests that can be handled within a reasonable time. New feature requests are out of scope, answering a bug report from support is in scope, including a fix if that's possible.
- Responding to paging alerts only at night. On some teams we did have sister teams in other regions to cover with their on-call over some portion of the night.
- Generally, paging alerts are rare enough (once or twice a week) so out of work hours disruption is fairly low.
- Non-urgent breakages, bug reports, etc. are fairly common though.
Someone has to handle all that so it's a rotation. I don't think providing incentives to engineers to take more on-call is practical. Unless you are okay with them stagnating in their career. And it's the EM asking here so I'd hope they didn't want that.
ipnon|1 year ago
To put things simply, there are jobs in your organization that are not the responsibility of anyone, and thus when they are encountered they go on to the heap of "non-important" things to do. This is unfortunately common in software-making organizations. The problem is that if this heap gets to large it catches on fire. And allocating an engineer to spray water on this flaming trash heap on a reliable schedule is not what most people consider to be a fulfilling task of their employment.
So to answer your inquiry, perhaps in addition to giving extraordinary compensation to work which is by definition extraordinary (if it's ordinary work why does it need a special on-call system to handle it?), it is also best to make sure that items which regularly end up on the on-call heap become the responsibility of a person. In an early stage company customer support can be handled by the founder, bugs can be handled as part of sprints, and root cause analysis should be done as the final part of any on-call alert as a matter of good practice.
It's my belief, again, that making on-call unreasonably expensive incentivizes the larger organization to create a system that handles bugs, customer support, and reports before they end up on the flaming trash heap. And that long-term this reduces costs, churn, and burnout. I again point to Will Larson because I developed all my thinking on this based on his works.[1]
To put it succinctly: Making on-call just another job responsibility incentivizes the creation of an eternal flaming trash heap that a single, poor engineer is responsible for firefighting on a reliable schedule (not fun). Recognizing that on-call is by its nature an extraordinary job responsibility, and compensating engineers in alert in extraordinary fashion, incentivizes the larger organization, i.e. executives, directors and managers, to build systems to minimize, extinguish, and eventually destroy the flaming trash heap (yay).
[0] Organization smell, analogous to a "code smell", where a programmer with sufficient intuition can tell something is amiss without being able to precisely describe it immediately.
[1] https://lethain.com/doing-it-harder-and-hero-programming/. I recommend buying "An Elegant Puzzle" because some of his best essays on the subject of on-call are only available in the book, not on his blog.