(no title)
nik282000 | 1 month ago
This plant is operated and deigned to the spec of an international corp with more than 20 factories, it's not a mom-and-pop operation. No one seems to think the excessive, useless, alarms are an issue and that any damage caused by missed warnings is the fault of the operator. When approaching management and engineering about this the responses range from "it's not in the budget" to " you're maintenance, fix all the problems and the alarms will go away".
The only way for this kind of issue to be resolved is with regulation and safety standards. An operator can't safely operate equipment when alarms are not filtered or sorted in some way. It's like forcing your IT guy to watch web server access logs live to spot vulnerabilities being exploited.
terminalshort|1 month ago
gopher_space|1 month ago
pstuart|1 month ago
Absolutely, and we'd collectively be better served if we had tools to deal with it.
I think of it as "incentive ecology" -- as noted, everybody has their own incentives which shapes their behavior, which causes downstream issues that begin the process anew.
Obviously there's no simple one-shot solution to this, but what if we had ways to simplify and model this "web of responsibility" (some sort of game theory exposed as an easily consumed presentation, with computed outcomes that show the cost/ROI/risk/reward) that could be shared by all stakeholders?
Obscurity and deniability are the weapons wielded in most of these scenarios, so what if we could render them obsolete?
Sure, those in power would not want to yield their advantages, but the overall outcomes should reward everybody by minimizing risks and maximizing rewards for the enterprise and everybody wins.
Yes, I'm looking at it as a an engineer and a dreamer, but I think if such a tool existed that was open source and easily accessible that this work could be done by rogue participants that could put it out there so it's undeniable.
bluGill|1 month ago
if there are a lot of issues the lawyers will also ask why they were not corrected first: using that to establish a pattern of bad maintenance.
renewiltord|1 month ago
After all, read any post-mortem comments on HN. Many of those people can be hired as expert if you like. They will say “I would have put an alert on it and had testing”. You will lose the case.
“Oh but we are trying to keep error rate low”. Yes, but now your company is dead when high error rate company is alive.
In revealed preferences, most engineers prefer vendors who have CYA. This is obvious from online comments. This is not because they are engineer. It’s because most people want to believe that event is freak accident.
Building system for error budget is not actually easy. Even for engineer who say they want it. Because when error happens, they immediately say it should not have happened. Counterfactual other errors blocked, and business existing are not considered. Every engineer is genius in hindsight. Every person is genius in hindsight.
Why these genius never make failure proof company? They do not. Who would not pay same price for 100% reliable tech?
mmooss|1 month ago
Those people have valuable input on issues the engineer may not understand and have little experience with. And engineers are just as likely to take the easy way out, like the caricature in the parent comment:
For example, for the manufacturer's engineering team it's much easier, faster and cheaper to slap an alarm on everything than to learn attention management and to think through and create an attention management system that is effective and reliable (and it had better be reliable - imagine if it omits the wrong alarms!). I think anyone with experience can imagine the decision to not delay the project and increase costs for that involved subproject - one that involves every component team, which is a priority for almost none of them, and which many engineers, such as the mechanical engineer working on the robotic arm, won't even understand the need for.
> And China's society is run by engineers, so it will win out over ours.
History has not been kind to engineers who do non-engineering, such as US President Herbert Hoover who built dams and but also had significant responsibility for the Great Depression. It's not that engineers can't acquire other skills and do well in those fields, but that other skills are needed - they aren't engineering. Those who accept as truth their natural egocentric bias and their professional community's bias toward engineering are unlikely to learn those skills.
anonymousiam|1 month ago
Unfortunately, some systems either don't track criticality, or some of the alerts are tagged with the wrong level.
(One example of the latter is the Ruckus WAP, which has a warning message tagged at the highest level of criticality, so about two or three times a month, I see the critical alert: "wmi_unified_mgmt_rx_event_handler-1864 : MGMT frame, ia_action 0x0 ia_catageory 0x3 status 0x0", which should be just an informational level alert, with nothing to be done about it. I've reported this bug to Ruckus a few times over the past five years, but they don't seem to care.)
varjag|1 month ago
miki123211|1 month ago
THe more of them you have, the more likely it is that there's a warning if something happens. Whether the warning is ever noticed is secondary, what matters is the fact that there was a warning and the operator didn't react to it appropriately, which makes the situation the fault of the operator.
cucumber3732842|1 month ago
In the eyes of the regulators and courts individual low level employees can not take responsibility. This is the logic by which they fine the company when someone does something you shouldn't need to be told not to do on a step ladder or whatever.
What this means is that low level employees become liability sinks. Show them all the warnings and make them figure it out. Give them all sorts of conflicting rules and let them sort out which ones to follow. Etc, etc.
varjag|1 month ago
nik282000|1 month ago
CamperBob2|1 month ago
Are you sure that's not what caused the problem in the first place? Unqualified and/or captured regulators who come up with safety standards that are out of touch with how the system needs to work in the real world?
AlotOfReading|1 month ago
Regulators coming up with engineering standards is pretty rare in general. Usually they incorporate existing professional standards from organizations like SAE, IEEE, IEC, or ISO.
lostdog|1 month ago
bsder|1 month ago
The problem at TMI was that the teletypewriter delivering the alerts wasn't fast enough to finish typing before new alerts came in. As time went on, the information it was emitting got further and further behind. Even if the operators wanted to make intelligent decisions, they were operating on hours old data that no longer applied.