That's still ridiculously slow.
I'd expect them to have hundreds of Microservices. Each one of those should be able to handle a random restart at any point in time so they should absolutely be able to restart 100s of servers concurrently without major disruptions.
Hell on Facebook scale a whole-Datacenter going down should not cause service disruptions.
Closi|2 years ago
Taking 45 days is probably more about caution and resolving issues systematically rather than pushing a big button and hoping you don’t cause issues.
I’d expect them to have thousands of microservices - and you only have to find a way to break one to cause big issues.
exitheone|2 years ago