top | item 44339931

(no title)

zzyzxd | 8 months ago

The article is unnecessarily long only to brag about "a service we didn't use went down so it didn't affect us". If I want to be picky, their architecture is also not perfect:

- Their alerts were not durable. The outage took out the alert system so humans were just eyeballing dashboards during the outage. What if your critical system went down along with that alert system, in the middle of night?

- The cloud marketplace service was affected by cloudflare outage and there's nothiing they could do.

- Tiered stroage was down, disk usage went above normal level. But there's no anomaly detection and no alerts. It survived because t0 storage was massively over provisioned.

- They took pride in using industry well-known designs like cell-based architecture, redundancy, multi-az...ChatGPT would be able to give me a better list

And I don't get whey they had to roast Crowdstrike at the end. I mean, the Crowdstrike incident was really amateur stuff, like, the absolute lowest bar I can think of.

discuss

order

No comments yet.