If I'm storing data on a NAS, and I keep backups on a tape, a simple hardware failure that causes zero downtime on S3 might take what, hours to recover? Days?
If my database server dies and I need to boot a new one, how long will that take? If I'm on RDS, maybe five minutes. If it's bare metal and I need to install software and load my data into it, perhaps an hour or more.
Being able to recover from failure isn't a premature optimization. "The site is down and customers are angry" is an inevitability. If you can't handle failure modes in a timely manner, you aren't handling failure modes. That's not an optimization, that's table stakes.
It's not about five nines, it's about four nines or even three nines.
Backups are point in time snapshots of data, often created daily and sometimes stored on tape.
It's primary usecase is giving admins the ability to e.g restore partial data via export and similar. It can theoretically also be used to restore after you had a full data loss, but that's beyond rare. Almost no company has had that issue.
This is generally not what's used in high availability contexts. Usually, companies have at least one replica DB which is in read only and only needs to be "activated" in case of crashes or other disasters.
With that setup you're already able to hit 5 nines, especially in the context of b2e companies that usually deduct scheduled downtimes via SLA
you have to look at all the factors, a simple server in a simple datacenter can be very very stable. When we were all doing bare metal servers back in the day server uptimes measured in years wasn't that rare.
This is true. Also some things are just fine, in fact sometimes better (better performing at the scale they actually need and easier to maintain, deploy, and monitor), as a single monolith instead of a pile of microservices. But when comparing bare metal to cloud it would be nice for people to acknowledge what their solution doesn't give, even if the acknowledgement comes with the caveat “but we don't care about that anyway because <blah>”.
And it isn't just about 9s of uptime, it is all the admin that goes with DR if something more terrible then a network outage does happen, and other infrastructure conveniences. For instance: I sometimes balk at the performance we get out of AzureSQL given what we pay for it, and in my own time you are safe to bet I'll use something else on bare metal, but while DayJob are paying the hosting costs I love the platform dealing with managing backup regimes, that I can do copies or PiT restores for issue reproduction and such at the click of the button (plus a bit of a wait), that I can spin up a fresh DB & populate it without worrying overly about space issues, etc.
I'm a big fan of managing your own bare metal. I just find a lot of other fans of bare metal to be more than a bit disingenuous when extolling its virtues, including cost-effectiveness.
bastawhiz|3 months ago
If I'm storing data on a NAS, and I keep backups on a tape, a simple hardware failure that causes zero downtime on S3 might take what, hours to recover? Days?
If my database server dies and I need to boot a new one, how long will that take? If I'm on RDS, maybe five minutes. If it's bare metal and I need to install software and load my data into it, perhaps an hour or more.
Being able to recover from failure isn't a premature optimization. "The site is down and customers are angry" is an inevitability. If you can't handle failure modes in a timely manner, you aren't handling failure modes. That's not an optimization, that's table stakes.
It's not about five nines, it's about four nines or even three nines.
ffsm8|3 months ago
Backups are point in time snapshots of data, often created daily and sometimes stored on tape.
It's primary usecase is giving admins the ability to e.g restore partial data via export and similar. It can theoretically also be used to restore after you had a full data loss, but that's beyond rare. Almost no company has had that issue.
This is generally not what's used in high availability contexts. Usually, companies have at least one replica DB which is in read only and only needs to be "activated" in case of crashes or other disasters.
With that setup you're already able to hit 5 nines, especially in the context of b2e companies that usually deduct scheduled downtimes via SLA
bcrl|3 months ago
chasd00|3 months ago
unknown|3 months ago
[deleted]
dspillett|3 months ago
And it isn't just about 9s of uptime, it is all the admin that goes with DR if something more terrible then a network outage does happen, and other infrastructure conveniences. For instance: I sometimes balk at the performance we get out of AzureSQL given what we pay for it, and in my own time you are safe to bet I'll use something else on bare metal, but while DayJob are paying the hosting costs I love the platform dealing with managing backup regimes, that I can do copies or PiT restores for issue reproduction and such at the click of the button (plus a bit of a wait), that I can spin up a fresh DB & populate it without worrying overly about space issues, etc.
I'm a big fan of managing your own bare metal. I just find a lot of other fans of bare metal to be more than a bit disingenuous when extolling its virtues, including cost-effectiveness.
dpkirchner|3 months ago
hdgvhicv|3 months ago
withinboredom|3 months ago