(no title)
vhold | 3 years ago
By having everything so standardized and consistent, they had the exact same failure mode everywhere and lost redundant fault tolerance. If they had different interoperable switches, running different software, the outage wouldn't have been absolute.
When large complex distributed systems grow organically over time, they tend to wind up with diversity. It usually takes a big centralized project focused on efficiency to destroy that property.
yusyusyus|3 years ago
The practical downsides of this diversity live in the complexity of the interop (often slowing feature velocity), operations, and procurement/support.
But issues like the AT&T 4ESS outage have occurred before in IP networks, as an example, in some BGP bug. Diversity alleviates some of the global impact.
vlovich123|3 years ago
You can sometimes play this game with vendors because you want them to give you an interoperable interface so that you avoid vendor lock-in and have better pricing, but that’s a secondary benefit and staged rollouts should still be performed even if you have heterogenous software.
kortilla|3 years ago