top | item 42124306

(no title)

netdevnet | 1 year ago

> It's shockingly stable. So much so that resolving the root cause isn't considered a priority and so we've had this running for months.

I don't know why my senses tell me that this is wrong even if you can afford it

discuss

order

Retric|1 year ago

> I don't know why my senses tell me that this is wrong

The fix is also hiding other issues that show up. So it degrades over time and eventually you’re stuck trying to solve multiple problems at the same time.

pmarreck|1 year ago

^ This is the problem. Not only that, solving 10 bugs (especially those more difficult nondeterministic concurrency bugs) at the same time is hideously harder than solving 1 at a time.

As a Director of Engineering at my last startup, I had an "all hands on deck" policy as soon as any concurrency bug was spotted. You do NOT want to let those fester. They are nondeterministic, infrequent, and exponentially dangerous as more and more appear and are swept under the rug via "reset-to-known-good" mitigations.

crabbone|1 year ago

Guys might be looking to match the fame of the SolarWinds.