The major one that came to mind were cloud providers (AWS, CGP, Azure, etc.). Even if you manage to build in resilience to a major outage, I'd wager that many services your business depends on will not have the same levels of redundancy.
Especially considering the fact that most times services go down, they go down because someone tried to change something, not that something just broke out of nowhere. Sure, harddrive failures happen, but it's much more likely your testing/QA process doesn't catch something and that something breaks your platform when you deploy it.
Adding to that, making changes to distributed architectures are much harder to change correctly, than the alternatives, so you end up making it harder for you to change things in order to be more "resilient" against hardware/network failures, but subsequently get a higher "deploy failure" rate.
capableweb|3 years ago
Adding to that, making changes to distributed architectures are much harder to change correctly, than the alternatives, so you end up making it harder for you to change things in order to be more "resilient" against hardware/network failures, but subsequently get a higher "deploy failure" rate.