top | item 16508852

A good incident postmortem

28 points| taspeotis | 8 years ago |blogs.msdn.microsoft.com

discuss

order
[+] kerng|8 years ago|reply
The article doesn't appear to mention the root cause, its very long and somewhat difficult to read - but it might be me, its early sarurday morning here.

But,what was the root cause? I'd expect to see that in the first paragraph of a post-mortem. Was it a buffer overflow, an unhandled exception because xyz? Did I miss it?

[+] blibble|8 years ago|reply
I wouldn't say it was good at all, it's mostly a load of irrelevant charts showing mostly noise, and the actual cause of the problem is summed up as "Unfortunately, that telemetry had a bug in it."

I'm surprised they can ever figure anything out if they're using those charts

not a patch on some of the cloudflare, aws or google cloud postmortems

[+] johnmax|8 years ago|reply
personal taste: i am missing reading about the root cause of the problem in the first paragraph. have spent 5 minutes reading and nothing yet.. stopped altogether