top | item 31648056

(no title)

4by4by4 | 3 years ago

I’m curious what services you interact with that have five nines availability. Most services that I interact with, AWS APIs as an example, top out at an SLA of four nines for availability.

discuss

order

rwiggins|3 years ago

There is some nuance there. Because SLA violations typically have real-money impact, they're often backed by internal objectives (SLOs) that are quite a bit tighter. So if an API promises four nines of availability in its SLA, there is probably an internal target that's at least four-and-a-half nines, if not five.

Which is to say, I would not be surprised if many APIs with a four-nines SLA were actually closer to five nines.

From a planning perspective, you should probably stick to the public guarantees (i.e. the SLA), though. Although things can always go horribly wrong, like in the big Atlassian deletion snafu recently.

citizenpaul|3 years ago

I'm saying SRE/SLA has become a commodity. No one thinks about it. They just expect the systems to be up. The article is about how SRE can be more than "downtime" I propose it cannot and they are wrong.

citizenpaul|3 years ago

I guess I made a poor point. I am saying that SRE has basically solidified into the expectation of the high 99 percentile as the bottom level of expected downtime. Even two nine is like 1hr per year.

There really isn't much to discuss in the industry of SRE. You basically have to be flawless. Its like the janitor. Miss the garbage pickup one day and heads roll. If you have SRE's and the service is down you are done. There is no longer any nuance which is why the CEO guy in the OP doesn't think about SRE anymore than the garbage collection.

dijit|3 years ago

Even Google's own load balancer only offers 4 nines.

https://cloud.google.com/compute/sla

citizenpaul|3 years ago

OMG you guys are missing the point. THe article is not talking about what is a good SLA level. Its that SRE "could be so much more" My point is no it cannot. SRE is now a commodity, the expectation is that the system is up pretty much always. Even two nine is only an hour or so a year downtime. Jeeze