top | item 39582136

(no title)

softirq | 2 years ago

Most companies completely missed the point of SRE/PE/DevOps and keep them on separate teams doing sysadmin toil work and oncall thrown over the wall by engineers who are only concerned with feature deadlines. They regress them back to sysadmin duties and get none of the value of a true SRE program.

SRE should always be a subtitle for a SWE and not a separate position, and they should always be embedded with SWEs into one team either building products of infrastructure. The shared ownership and toil reduction only works if you have these two things.

All this said, I think the regression is also due to the fact that real SREs are rare. A solid SWE that also has deep systems domain knowledge, understanding how to sift through dashboards and live data, and root cause complex performance problems is a master of many domains and is hard to find.

discuss

order

snowfield|2 years ago

The regression is also due to that a real SRE is expensive. It's cheaper to just get some newly grads to react to alarms following a set runbook of what to do if that alarm triggers.

VERY few companies operate at googles scale. For 99.99% of companies it makes sense to investigate single machine issues.

bananapub|2 years ago

Google SREs also end up investigating single machine issues, fyi.