top | item 28751195

(no title)

korethr | 4 years ago

Yes, but Facebook is not a small company. Could PagerDuty realistically handle the scale of notifications that would be required for Facebook's operations?

discuss

order

antoinealb|4 years ago

PagerDuty does not solve some of the problems you would have at FB's scale, like how do you even know who to contact ? And how do they login once they know there is a problem ?

Spooky23|4 years ago

Sure. As long as you plan for disaster.

The place where I worked had failure trees for every critical app and service. The goal for incident management was to triage and have an initial escalation for the right group within 15 minutes. When I left they were like 96% on target overall and 100% for infrastructure.

robalfonso|4 years ago

Even if it can’t, it’s trivial to use it for an important subset, ie is Facebook.com down, is the ns stuff down etc. So there is an argument to be made for still using an outside service as a fallback

anigbrowl|4 years ago

Sure, if you're...

- not arrogant - or complacent - haven't inadvertently acquired the company - know your tech peers well enough to have confidence in their identity during an emergency - do regular drills to simulate everything going wrong at once

Lots of us know what should be happening right now, but think back to the many situations we've all experienced where fallback systems turned into a nightmarish war story, then scale it up by 1000. This is a historic day, I think it's quite likely that the scale of the outage will lead to the breakup of the company because it's the Big One that people have been warning about for years.

jfrunyon|4 years ago

I guarantee you that every single person at Facebook who can do anything at all about this, already knows there's an issue. What would them receiving an extra notification help with?