From The Devouring Fungus (Karla Jennings, 1990), chapter 6:
—
[…] Imagine the frustration of the Stanford University instructor with an especially thick repairman who kept showing up to fix the same computer. He’d stay for a few hours, poke around, announce the machine fine, and leave; but he never knew what he was doing, so the instructor kept calling him back because the machine was never fixed.
Finally the instructor got so mad that he said “Let’s do a binary switch. The problem has got to be one of the circuit boards.” The repairman didn’t know what a binary switch was, but he agreed because he had an angry customer.
A binary switch involves taking half the circuit boards from the machine having trouble and switching them with identical boards in another machine. If the problem’s been transferred to the new machine, you know that something’s wrong with the switched boards. Then you reswitch smaller and smaller number of boards until you pinpoint which board is defective.
They switched half the boards and, sure enough, the problem moved to the other machine. The repairman said, “So what do we do now?” So they swapped half of the changed boards. The trouble was that now both machines worked. (Perhaps shaking the boards around had put a loose contact back in place.)
“That’s great!” said the repairman. “I’m going to use this procedure from now on. I never knew you could fix a machine just by swapping boards!”
I was fortunate enough to learn this strategy outside of a software context first. In grad school I was responsible for maintaining our GC-MS and LC-MS machines (Gas/Liquid Chromatograph / Mass Spectrometer) in addition to using them for my own experiments on a regular basis. They were complex pieces of equipment that had a ton of fiddly components.
The best way to troubleshoot these machines was to split the problem in half again and again until you found the part that needed to be cleaned, repaired, or replaced. The way to do that was to run different known chemicals with their own known signatures through the machine again and again. For example, you could run a single chemical (maybe ethanol - I honestly don't remember which ones at this point) and see if it goes straight through the GC and gives you the peak you expect. If it doesn't you could look at the result and see if the timing was off (indicates something with the GC or gas flow) or if the peak was wonky (indicates something with the MS). And then you just keep going. (Sounds a lot like unit and integration tests right?)
Applying that same strategy works wonders for data pipelines as well. Is it something with the extractor or loader (or god forbid Airflow)? Break it down and go from there.
My only nit on this is that while bifurcate is technically accurate (or is it actually?) it feels unnecessarily complex sounding for folks learning this skill.
I've always used "divide and conquer" which is a cliche for a reason.
Bifurcate is commonly used by Indian colleagues, and perhaps programmers are familiar with the term, but it's not something used by US business people in my experience.
I can't find this post from a physicist, but they had their bicycle stolen and the police had given up searching the CCTV log because it was taking them too long, despite the fact that it was pointed at the bicycle.
He told them that even if the CCTV log went back to the earth's formation, they would only need to take about 57 stills to pinpoint the second the bike was stolen (apparently they were not persuaded).
I consider myself a "good" debugger, and I don't really know why. Reading this makes me think that I probably have several strategies, including this one, that I subconsciously use without having a label for them.
I really like this book (and have recommended it before on hn) http://debuggingrules.com/?page_id=31 - it helps to have a shared team vocab for some of the concepts.
Sure, and of course this is the essence of printf-debugging. You probably have a rough idea where the problem is, so throw in some printf statements to narrow it down, then maybe a 2nd set/etc to further narrow it as needed.
A brute force (non-targetted) binary search may always be an option if you have a problem where you don't even know where to look.
The principle reminds me of a “bifurcation algorithm” I found to bifurcate and explore parameter space in a sort of novel way. It has its drawbacks! But it was fun to tinker with, too.
Effectively, we define some function to map N -> B, where N is the natural integers and B is in [0, 1]. (We can then apply some arbitrary function on B for our actual parameter value). We find the initial values as 0, 1, 0.5, 0.25, 0.75; that is: the extrema, all halves (1), all missed quarters (2), missed eighths (4), missed sixteenths (8), and so on. I used some binary representation of N to get B; there are probably other ways to do it.
I found this pretty useful at bifurcating a highly dimensional parameter space – it meant that made sure at all times I could get a decently equal view of all points (with bias towards B=0) in expanding detail. It also meant I didn’t have to think about allocation time, which was useful on a somewhat packed (but use-as-needed) compute cluster.
Sounds a lot like quasi-random low-discrepancy sequences like Sobol or Halton. Very effective tool to accelerate Monte Carlo integrations, which boils down to the effectiveness of exploring the phase space (but you have to make sure that sequences in different dimensions are independent!).
Asking questions which yield more bits of information is good practice. Bifurcating a space creates higher entropy signals when you finally measure.
A greyscale camera can tell you white or black and you could differentiate day from night. But a color camera can tell you red white blue black and so now you can derive sunrise sunny cloudy night. The set of observables is what you partition into signals.
Enhancing your observable space lets you derive stronger signals to act on.
I do this, because I like dumb solutions. This, along with challenging assumptions, is basically how I debug. I love git bisect, a lot of people don't even know it's there.
I'll note that, when pair programming, the people who sit and reason about the code often outperform me, so the way I look at it I'm trading time against complexity.
I work for a telecom, and I've heard techs suggest something similar to find/fix problems in the field. If you can't find the source of an intermittent problem, then change/replace something (anything!) to narrow scope for next time it happens.
The fact that the author is "commenting things out" makes me think that they don't have access to a debugger on their environment. When I find myself on that particular situation, I tend to:
* Add lots of logs (e.g. "entered function X with parameters a, b, c", "exited function X, return value W").
* "Make things explode in a useful way". This is, cause an error on purpose, which halts the program. This is useful for debugging server-like environments where many different things might happen at the same time, so the logs become insufficient. It is way slower than having a debugger doing step-by-step scrutiny, but it can be done.
"Commenting out" is often meant informally, not to be taken literally.
Your debugger may not work with a multithreaded program. Or it might cause the bug to disappear because it changes scheduling/execution order. Printf is more reliable in this context.
Or they may be debugging hardware, which the debugger can't step into. Or debugging object code without symbols.
Or they may be debugging a distributed system with remote code execution and asynchrony.
Or they may be debugging a language that doesn't have an execution stack.
Or they may be debugging an issue that only happens in production or customer machines that they can't access directly.
Or they may be debugging an issue that only happens very rarely, longer than the patience of a programmer to wait for a breakpoint.
Debuggers are one tool we have available, but hardly the only tool, and definitely not a complete tool.
teddyh|3 years ago
—
[…] Imagine the frustration of the Stanford University instructor with an especially thick repairman who kept showing up to fix the same computer. He’d stay for a few hours, poke around, announce the machine fine, and leave; but he never knew what he was doing, so the instructor kept calling him back because the machine was never fixed.
Finally the instructor got so mad that he said “Let’s do a binary switch. The problem has got to be one of the circuit boards.” The repairman didn’t know what a binary switch was, but he agreed because he had an angry customer.
A binary switch involves taking half the circuit boards from the machine having trouble and switching them with identical boards in another machine. If the problem’s been transferred to the new machine, you know that something’s wrong with the switched boards. Then you reswitch smaller and smaller number of boards until you pinpoint which board is defective.
They switched half the boards and, sure enough, the problem moved to the other machine. The repairman said, “So what do we do now?” So they swapped half of the changed boards. The trouble was that now both machines worked. (Perhaps shaking the boards around had put a loose contact back in place.)
“That’s great!” said the repairman. “I’m going to use this procedure from now on. I never knew you could fix a machine just by swapping boards!”
—
agumonkey|3 years ago
tayloramurphy|3 years ago
The best way to troubleshoot these machines was to split the problem in half again and again until you found the part that needed to be cleaned, repaired, or replaced. The way to do that was to run different known chemicals with their own known signatures through the machine again and again. For example, you could run a single chemical (maybe ethanol - I honestly don't remember which ones at this point) and see if it goes straight through the GC and gives you the peak you expect. If it doesn't you could look at the result and see if the timing was off (indicates something with the GC or gas flow) or if the peak was wonky (indicates something with the MS). And then you just keep going. (Sounds a lot like unit and integration tests right?)
Applying that same strategy works wonders for data pipelines as well. Is it something with the extractor or loader (or god forbid Airflow)? Break it down and go from there.
My only nit on this is that while bifurcate is technically accurate (or is it actually?) it feels unnecessarily complex sounding for folks learning this skill.
ianmcgowan|3 years ago
Bifurcate is commonly used by Indian colleagues, and perhaps programmers are familiar with the term, but it's not something used by US business people in my experience.
mattbee|3 years ago
He told them that even if the CCTV log went back to the earth's formation, they would only need to take about 57 stills to pinpoint the second the bike was stolen (apparently they were not persuaded).
joshjje|3 years ago
darkerside|3 years ago
ianmcgowan|3 years ago
There's a summary set of slides here that gives a great overview: https://courses.cs.washington.edu/courses/cse474/18wi/pdfs/l...
ChrisMarshallNY|3 years ago
My original training was as an electronic technician; specifically, an RF tech.
We would debug through a signal path.
The very first thing they taught us to do, was stick a probe in the halfway point, and see if the signal was what it was supposed to be.
HarHarVeryFunny|3 years ago
A brute force (non-targetted) binary search may always be an option if you have a problem where you don't even know where to look.
yunruse|3 years ago
Effectively, we define some function to map N -> B, where N is the natural integers and B is in [0, 1]. (We can then apply some arbitrary function on B for our actual parameter value). We find the initial values as 0, 1, 0.5, 0.25, 0.75; that is: the extrema, all halves (1), all missed quarters (2), missed eighths (4), missed sixteenths (8), and so on. I used some binary representation of N to get B; there are probably other ways to do it.
I found this pretty useful at bifurcating a highly dimensional parameter space – it meant that made sure at all times I could get a decently equal view of all points (with bias towards B=0) in expanding detail. It also meant I didn’t have to think about allocation time, which was useful on a somewhat packed (but use-as-needed) compute cluster.
davrosthedalek|3 years ago
nh23423fefe|3 years ago
A greyscale camera can tell you white or black and you could differentiate day from night. But a color camera can tell you red white blue black and so now you can derive sunrise sunny cloudy night. The set of observables is what you partition into signals.
Enhancing your observable space lets you derive stronger signals to act on.
flir|3 years ago
I'll note that, when pair programming, the people who sit and reason about the code often outperform me, so the way I look at it I'm trading time against complexity.
HarHarVeryFunny|3 years ago
Y_Y|3 years ago
metadat|3 years ago
Bifurcation means divide, split, or fork something into two.
jacknews|3 years ago
'bifurcate the problem space' sounds so Silicon Valley.
unknown|3 years ago
[deleted]
otikik|3 years ago
* Add lots of logs (e.g. "entered function X with parameters a, b, c", "exited function X, return value W"). * "Make things explode in a useful way". This is, cause an error on purpose, which halts the program. This is useful for debugging server-like environments where many different things might happen at the same time, so the logs become insufficient. It is way slower than having a debugger doing step-by-step scrutiny, but it can be done.
smilliken|3 years ago
Your debugger may not work with a multithreaded program. Or it might cause the bug to disappear because it changes scheduling/execution order. Printf is more reliable in this context.
Or they may be debugging hardware, which the debugger can't step into. Or debugging object code without symbols.
Or they may be debugging a distributed system with remote code execution and asynchrony.
Or they may be debugging a language that doesn't have an execution stack.
Or they may be debugging an issue that only happens in production or customer machines that they can't access directly.
Or they may be debugging an issue that only happens very rarely, longer than the patience of a programmer to wait for a breakpoint.
Debuggers are one tool we have available, but hardly the only tool, and definitely not a complete tool.