(no title)
shantly | 6 years ago
[EDIT] oh and of course The Cloud itself dies from time to time, too. Usually due to configuration fuck-ups on their absurd Rube Goldberg deployment processes :-) I don't think one safely-managed (see above points) server is a ton worse than the kind of cloud use any mid-sized-or-smaller business can afford, outside certain special requirements. Your average CRUD app? Just rent a server from some place with a good reputation, once you have paying customers (just host on a VPS or two until then). All the stuff you need to do to run it safely you should be doing with your cloud shit anyway (testing your backups, testing your re-deploy-from-scratch capability, "shit's broken" alerts) so it's not like it takes more time or expertise. Less, really.
abraxas|6 years ago
That's a lot of "cloud" right there in a single server.
carlsborg|6 years ago
AWS gives you availability zones, which are usually physically distinct datacenters in a region, and multiple regions. Well designed cloud apps failover between them. Very very rarely have we seen an outage across regions in AWS, if ever.
shantly|6 years ago
cpitman|6 years ago
And by the way, how much are they willing to spend on their desired level of availability?
I still need a better way to run these conversations, but I'm trying to find a way to bring it back to cost. How much does an hour of downtime really cost you?
peterwwillis|6 years ago
> An awful lot of server systems can tolerate a hardware failure on their one server every couple years given 1) good backups, 2) "shit's broken" alerts, and 3) reliable push-button re-deploy-from-scratch capability, all of which you should have anyway
Just.... just... no. First of all, nobody's got good backups. Nobody uses tape robots, and whatever alternative they have is poor in comparison, but even if they did have tape, they aren't testing their restores. Second, nobody has good alerts. Most people alert on either nothing or everything, so they end up ignoring all alerts, so they never realize things are failing until everything's dead, and then there goes your data, and also your backups don't work. Third, nobody needs push-button re-deploy-from-scratch unless they're doing that all the time. It's fine to have a runbook which documents individual pieces of automation with a few manual steps in between, and this is way easier, cheaper and faster to set up than complete automation.
shantly|6 years ago
But you should test your backups and set up useful alerts with the cloud, too.
> Third, nobody needs push-button re-deploy-from-scratch unless they're doing that all the time. It's fine to have a runbook which documents individual pieces of automation with a few manual steps in between, and this is way easier, cheaper and faster to set up than complete automation.
Huh. I consider getting at least as close as possible to that, and ideally all the way there, vital to developer onboarding and productivity anyway. So to me it is something you're doing all the time.
[EDIT] more to the point, if you don't have rock-solid redeployment capability, I'm not sure how you have any kind of useful disaster recovery plan at all. Backups aren't very useful if there's nothing to restore to.
[EDIT EDIT] that goes just as much for the cloud—if you aren't confident you can re-deploy from nothing then you're just doing a much more complicated version of pets rather than cattle.