top | item 40957273

(no title)

kaiokendev | 1 year ago

Have been in situations just like this, on pretty much every side (the fuck-upper, the person who has to fix the fuck up, and the person who has to come up with a fuck-up remediation plan)

The most egregious case involved an incompetent configuration that resulted in hundreds of millions $ in lost data and a 6-month long automated recovery project. Fortunately, there were traces of the data across the entire stack - from page caches in a random employee's browser, to automated reports and OCR dumps. By the end of the project, all data was recovered. No one from outside ever found out or even realized anything had happened - we had redundancy upon redundancy across several parts of the business, and the entire company basically shifted the way we did ops to work around the issue for the time being. Every department had a scorecard tracking how many of their files were recovered, and we had little celebrations when we hit recovery milestones. To this day only a few people know who was responsible (wasn't me! lol)

Blame and derision are always inevitable in situations like this. It's how it's handled afterwards that really marks the competence of the company.

discuss

order