top | item 47008304

(no title)

multisport | 16 days ago

Genuinely one of the most shocking incident reports I have read in a long time, rivals https://www.coderabbit.ai/blog/our-response-to-the-january-2...

discuss

order

christophilus|16 days ago

What’s shocking about it? Seems like the usual culprit— a bad config rollout. Took a long time to identify, so maybe that’s shocking. But I can attest that sometimes, you get into fight or flight mode and miss the obvious when trying to diagnose a disruption like this.

That said, nowadays, the first thing I do is spawn an agent to look through the most recent commits and try to identify something that could be the cause of a service outage.

This one seems like something Claude Code or Codex would have quickly flagged.

multisport|16 days ago

Agreed, we've all been there, but 4 hours! For a network config change. No one raised their hand and said "hey I just toggled this thing maybe we should look, I did it exactly when our entire region went had down"