top | item 25472804

(no title)

blaisio | 5 years ago

Hmm one thing that jumped out at me was the organizational mistake of having a very long automated "grace period". This is actually bad system architecture. Whenever you have a timeout for something that involves a major config change like this, the timeout must be short (like less than a week). Otherwise, it is very likely people will forget about it, and it will take a while for people to recognize and fix the problem. The alternative is to just use a calendar and have someone manually flip the switch when they see the reminder pop up. Over reliance on automated timeouts like this is indicative of a badly designed software ownership structure.

discuss

paxys|5 years ago

We once found a very annoying bug which was caused because someone set a feature flag to a tiny rollout % and then left the company without updating it. It sat that way for 2 years before someone finally noticed.

NoodleIncident|5 years ago

What's insane to me is that a grace period is built in, but that the mere fact that this grace protection is active isn't a giant neon sign on their dashboards and alerts. I do see how it could slip through the cracks, since it was the reported usage that was wrong, not the quota itself.

sleepydog|5 years ago

I agree, and even if the grace period were a good idea, enforcement should have slowly ratcheted up over the grace period, rather than having full enforcement immediately after it expired.

tantalor|5 years ago

This is also called a "time bomb". It's a bad thing.