top | item 27659772

(no title)

faoileag | 4 years ago

Am I missing something? Or is this article really advocating to not count the worst thing that can happen to an utility (electricity provider, traffic lights, streaming service, online shop...)? Namely the unavailability of the service it offers? That is an interesting approach. Well, one thing's for sure: even if you don't count these things, your customers will.

discuss

chronid|4 years ago

My interpretation is that it's advocating for counting the thing that matter, not the consequence (e.g., the "SEV" event).

The problem is your availability being <X%, your API responding >Xs Y% of the time, not "we had X SEVs last half". People will gravitate to numbers that are talked about (because they at least appear important), so you should try to talk about the right ones.

It's hard enough to convince junior folks to open incidents without them feeling like they will mess up a very-important-management-eyes-on-it metric if they do...

karmakaze|4 years ago

I get the point of choosing key metrics that are indicative of the reliability or quality of service. Having to narrow it down to a headline makes it less clear. For a datacentre region, counting number of outages may very well be a great bottom-line indication, with more fine grained objectives being tracked.