top | item 45918745

(no title)

Premature optimization. Not every single service needs or require 5 nines.

discuss

What does that mean, though?

If I'm storing data on a NAS, and I keep backups on a tape, a simple hardware failure that causes zero downtime on S3 might take what, hours to recover? Days?

If my database server dies and I need to boot a new one, how long will that take? If I'm on RDS, maybe five minutes. If it's bare metal and I need to install software and load my data into it, perhaps an hour or more.

Being able to recover from failure isn't a premature optimization. "The site is down and customers are angry" is an inevitability. If you can't handle failure modes in a timely manner, you aren't handling failure modes. That's not an optimization, that's table stakes.

It's not about five nines, it's about four nines or even three nines.

ffsm8|3 months ago

You're confusing backup with high availability.

Backups are point in time snapshots of data, often created daily and sometimes stored on tape.

It's primary usecase is giving admins the ability to e.g restore partial data via export and similar. It can theoretically also be used to restore after you had a full data loss, but that's beyond rare. Almost no company has had that issue.

This is generally not what's used in high availability contexts. Usually, companies have at least one replica DB which is in read only and only needs to be "activated" in case of crashes or other disasters.

With that setup you're already able to hit 5 nines, especially in the context of b2e companies that usually deduct scheduled downtimes via SLA

bcrl|3 months ago

I know one company that strove for five sixes.

chasd00|3 months ago

you have to look at all the factors, a simple server in a simple datacenter can be very very stable. When we were all doing bare metal servers back in the day server uptimes measured in years wasn't that rare.

unknown|3 months ago

[deleted]

dspillett|3 months ago

This is true. Also some things are just fine, in fact sometimes better (better performing at the scale they actually need and easier to maintain, deploy, and monitor), as a single monolith instead of a pile of microservices. But when comparing bare metal to cloud it would be nice for people to acknowledge what their solution doesn't give, even if the acknowledgement comes with the caveat “but we don't care about that anyway because <blah>”.

And it isn't just about 9s of uptime, it is all the admin that goes with DR if something more terrible then a network outage does happen, and other infrastructure conveniences. For instance: I sometimes balk at the performance we get out of AzureSQL given what we pay for it, and in my own time you are safe to bet I'll use something else on bare metal, but while DayJob are paying the hosting costs I love the platform dealing with managing backup regimes, that I can do copies or PiT restores for issue reproduction and such at the click of the button (plus a bit of a wait), that I can spin up a fresh DB & populate it without worrying overly about space issues, etc.

I'm a big fan of managing your own bare metal. I just find a lot of other fans of bare metal to be more than a bit disingenuous when extolling its virtues, including cost-effectiveness.

dpkirchner|3 months ago

It's true, but I'm woken up more frequently if there are fewer 9s, which is unpleasant. It's worth the extra cost to me.

hdgvhicv|3 months ago

Hence you can use AWS to host them.

withinboredom|3 months ago

and each additional nine increases complexity geometrically.