top | item 40453379

(no title)

Genuine question, are distributed systems naturally more resilient?

I can see arguments for both sides. Your point and then the hidden failure modes without central observability and ownership. Nothing exists in isolation.

discuss

zevv|1 year ago

Not distributed per se, but diversity makes a huge difference in resilience.

When everybody is using the exact same tech, the fall out of an incident can be huge because it will affect everybody everywhere at the same time. Superficially it might seem efficient and smart, but the end result is fragility.

Diversity of species is what nature ended up with as the ultimate solution: the individual species do not matter, but life as a whole will be able to flourish. With technology, we're now moving the other way: every single thing gets concentrated into one of the few cloud providers. Resilience decreases, fragility increases.

salawat|1 year ago

I prefer heterogeneity rather than diversity. Different implementations of similar processes fenerally make different tradeoffs, incurring different bottlenecks, and resulting in an ecosystem with a higher statistical probability that one relative Black Swan won't wipe out a key structural function in it's totality.

It's actually a hallmark of building fault tolerant systems and ecosystems. Pity the economists and MBA's can't be convinced of it. Otherwise there'd be less push to create TBTF institutions.

decremental|1 year ago

[deleted]

_heimdall|1 year ago

Distribution alone doesn't make a system resilient. A distributed system can help with resilience for anything related to network or hardware failure, but even then you need to make sure the different resources don't have a hard dependency on each other.

If you want a resilient system redundancy and automatic failover systems are really important, along with solid error handling.

Think about a distributed data store for example. You may spread all your data across multiple distributed areas, but if each area is managing a shard of data and they aren't replications then you still lose functionality when any one region goes down. If you instead have a copy complete copy of data in each region, and a system to automatically switch regions if the primary goes down, your system is much more resilient to outages (though also more complex and expensive).

Timshel|1 year ago

It does not garanty resiliency but it does increase it.

If tomorow mastodon.social disappear the network might lose 80% of it's content but recovery could be possible even if the server never come back.

oefrha|1 year ago

With a large number of small providers, more often than not some of them will fail on any given day, but stars need to align really well to get a half-of-the-internet-is-down kind of failure caused by AWS or Cloudflare.

halfcat|1 year ago

Not exactly “more resilient”, but rather, “the only way to gain more resiliency over a single system”.

A distributed system can be more resilient, but it also adds complexity, making it (sometimes) less reliable.

A single system with a lot of internal redundancy can be more reliable than a poorly implemented distributed system, which is why at a smaller scale it’s often better to scale vertically until a single node can’t handle your needs.

Distributed systems are more of a necessity than “the best way”. If we could just build a single node that scaled infinitely, that would be more reliable than a distributed system.

cpeterso|1 year ago

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” — Leslie Lamport, 1987

steve1977|1 year ago

Distributed systems with tight coupling and no redundancy are less resilient. It's not so much a question about distribution but more about redundancy and coupling.

naasking|1 year ago

> Genuine question, are distributed systems naturally more resilient?

Only if they've prioritized the "availability" component from the CAP theorem.

Dalewyn|1 year ago

>are distributed systems naturally more resilient?

All else being equal: Yes.

It's like asking if a RAID1 is more resilient than a single drive.

steve1977|1 year ago

RAID1 is mirrored. That is not what I would call a typical distributed system. It is a very redundant system. Like a cluster.

A distributed system without redundancy would rather be something like data stripped across disks without parity.

And that actually makes it less resilient, because failure of one component can bring down the whole system and the likelihood of failure is statistically higher because of the higher number of components.

CWuestefeld|1 year ago

To the GP's point - if you lose the RAID controller, then you've lost a whole lot more than just a single drive failure.