(no title)
rfreiberger | 4 years ago
Yes, it's a simple mistake but how was a system allowed access to our global environment that this edge case was never calculated? In many of the meetings, the common issue is communication even between co-workers on the same team, and between internal platform providers. One case was an outage on the storage backend and realized after a long meeting that the internal SLA was much greater than we expected (and which the systems would timeout). It only worked for so long as storage utilization was extremely low.
eternityforest|4 years ago
Programming culture has almost football field level of "No time for weakness" attitude.
If I see a possible failure mode of a system, and bring it up, someone's going to tell me to stop being a clicky click windows idiot and learn to be careful.
Trying to prevent human error in software isn't seen as a priority so nobody does it. They are concerned with the most reliable code rather than the most reli4 code-user-hardware-task-schedule-conditions system.
Programmers need to accept software fixes for human and hardware failures. It's a lot easier to add a confirmation dialog than it is to somehow become 100% reliable at not clicking the wrong thing.