This issue comes up in my role a lot, where I am often dealing with various environmental conditions and human factors, plus multiple integration points between various software and hardware systems.
The answer is that you keep working at it iteratively using a combination of logging, reporting, and defensive programming to systematically narrow down the possible causes. Sometimes you never arrive at a true root cause, but you get close enough that you can mitigate the problem and finally close the ticket out. At the end of the day, the customer/user doesn't care as long as it works.
However, what will really piss them off is telling them your hands are tied until they can reliably reproduce the issue for you. It's important they understand that you are working on it, and typically they will go out of their way to help solve the problem when they feel taken care of.
beart|7 months ago
The answer is that you keep working at it iteratively using a combination of logging, reporting, and defensive programming to systematically narrow down the possible causes. Sometimes you never arrive at a true root cause, but you get close enough that you can mitigate the problem and finally close the ticket out. At the end of the day, the customer/user doesn't care as long as it works.
However, what will really piss them off is telling them your hands are tied until they can reliably reproduce the issue for you. It's important they understand that you are working on it, and typically they will go out of their way to help solve the problem when they feel taken care of.