Took me a minute to figure out what was wrong. If IsLeapYear() returns true, and "days" equals 366, you get stuck in an infinite loop (you don't modify "days" at all).
This was the first leap year since Zune was released, and yesterday was the 366th day of the year, sooooo....
Normally, gotos wouldn't _necessarily_ be bad, but at least one of these is really really bad.
Consider the function OALIoCtlHalInitRTC, lines 122 and 139-143. The last else block has no obvious terminator, but it looks like the else is empty because of the comment and the indentation. So the line after the label cleanUp is the body of the else. This means that if the if on line 131 is true, the cleanUp code never gets executed. Now this may be intentional (but I doubt it since the label is named cleanUp), but that should be documented in the code.
I consider this use of a goto to be legitimate, since it avoids duplication of the cleanup code (and the last thing you want is to have multiple similar blocks of code that need to be maintained/modified in tandem). This idiom often appears in the linux kernel IIRC. But this code is quite terrible since it doesn't use the idiom correctly, produces an unexpected execution path, is a problem to maintain, and is a problem to read. It seems that it ends up being a no-op since the line after the cleanUp label is a debugging line or something, but that doesn't make this a good practice.
But really, this isn't really a problem with the use of goto, it's a problem with the optional braces on single statement blocks in C. The goto and the label chosen just make it harder to spot. I've made it a habit to use braces around all blocks, even empty ones, to make the intent clearer and avoid this problem.
A lesson in code smell and good style: I took one look at the code and (before noticing the bug) thought, “why is there an inner if instead of just using a single conditional with a logical AND?”
That is to say, I could have fixed the bug without knowing what it was. (I'm a hs student and don't do much bug-shooting in code that wasn't written by me or my close friends, so I don't know if this is common.)
Last I checked, there haven't been any major overhauls to the Gregorian calendar since C was invented. So why aren't we all using the same, fully-debugged date library yet?
Edit: Actually, I'm cutting it close :-). The last time we changed the calendar was in 1972, when we settled on integer leap seconds -- the same year that C was invented.
Hear, hear. It's strange how often we have to get bitten by date and time bugs. I think people ignore the wisdom of Peter van der Linden in the excellent book "Expert C Programming":
"Anyone who thinks programming dates and times is easy probably hasn't done much of it."
This should work if that was a >= sign. And people at work don't believe me when I say how insanely careful I have to be when I write any code, especially code that deals with people's payroll and vacation time. One missing equals symbol and every Zune in the world froze. One missing equals symbol and every employee could lose 1/12 of their accumulated vacation time.
The ability to pay attention to "mundane details" is just as important as the ability to envision the whole software application from the project manager point-of-view, especially when working in smaller companies without additional oversight.
> One missing equals symbol and every Zune in the world froze.
Wrong. Completely ignoring all software engineering "best practices" was the problem. They could have reused a working library. They could have written unit tests tests. QA could have tested it. Other developers could have reviewed it, or written test cases without knowledge of how the code worked.
Basically, they ignored every rule for writing good code, and then ended up with a serious bug. Well, yeah. The whole "unit testing" "fad" exists for a reason -- it greatly improves the quality of software.
Many people take software development seriously enough to not let one brain fart cause a piece of software to brick thousands of devices.
Really, this looks like a rather silly way to put this function together. It doesn't even run in constant time! It might be a little complicated, due to the leap year rules, but it would probably not be too difficult to do this calculation without any loops.
No. That's incorrect. That would cause the year to go up on Dec. 31st (day 366), turning it into day 0. In fact, what you want is to change the while to an if statement. If it really does need to run as a while statement (to add more than one year at once?), then the alternative would be a break statement after line 271.
Someone at QA it's going to be fired. Corner cases like 1st and last days of a year(leap and not), 29th of February and the like are the first candidates for proper blackbox testing.
I wouldn't put all the blame on the programmer (who probably is now scared to death).
It's allegedly from a Freescale clock driver (see the comments at the top). I'm trying to find the original on Freescale's site, unsuccessfully thus far.
If it is indeed a Freescale-supplied driver issue, it makes me wonder how many other devices failed yesterday...
I find myself wanting to see the version history for that function. Is that how it was originally written? Was the bug introduced by a previous bug-fix? By a refactoring? Something else?
How did these guys (pastie.org) get hold of he Zune source code? I don't think Microsoft gives away any of their source to the public (except for some programs covered by MS's shared source license).
[+] [-] tlrobinson|17 years ago|reply
This was the first leap year since Zune was released, and yesterday was the 366th day of the year, sooooo....
Also spotted in this code: several "goto"s.
[+] [-] thwarted|17 years ago|reply
Consider the function OALIoCtlHalInitRTC, lines 122 and 139-143. The last else block has no obvious terminator, but it looks like the else is empty because of the comment and the indentation. So the line after the label cleanUp is the body of the else. This means that if the if on line 131 is true, the cleanUp code never gets executed. Now this may be intentional (but I doubt it since the label is named cleanUp), but that should be documented in the code.
I consider this use of a goto to be legitimate, since it avoids duplication of the cleanup code (and the last thing you want is to have multiple similar blocks of code that need to be maintained/modified in tandem). This idiom often appears in the linux kernel IIRC. But this code is quite terrible since it doesn't use the idiom correctly, produces an unexpected execution path, is a problem to maintain, and is a problem to read. It seems that it ends up being a no-op since the line after the cleanUp label is a debugging line or something, but that doesn't make this a good practice.
But really, this isn't really a problem with the use of goto, it's a problem with the optional braces on single statement blocks in C. The goto and the label chosen just make it harder to spot. I've made it a habit to use braces around all blocks, even empty ones, to make the intent clearer and avoid this problem.
[+] [-] noahlt|17 years ago|reply
That is to say, I could have fixed the bug without knowing what it was. (I'm a hs student and don't do much bug-shooting in code that wasn't written by me or my close friends, so I don't know if this is common.)
[+] [-] dfranke|17 years ago|reply
Edit: Actually, I'm cutting it close :-). The last time we changed the calendar was in 1972, when we settled on integer leap seconds -- the same year that C was invented.
[+] [-] projectileboy|17 years ago|reply
"Anyone who thinks programming dates and times is easy probably hasn't done much of it."
[+] [-] chime|17 years ago|reply
This should work if that was a >= sign. And people at work don't believe me when I say how insanely careful I have to be when I write any code, especially code that deals with people's payroll and vacation time. One missing equals symbol and every Zune in the world froze. One missing equals symbol and every employee could lose 1/12 of their accumulated vacation time.
The ability to pay attention to "mundane details" is just as important as the ability to envision the whole software application from the project manager point-of-view, especially when working in smaller companies without additional oversight.
[+] [-] tlrobinson|17 years ago|reply
I think the "while" condition should actually be
(or perhaps some simplification that I'm too lazy to figure out right now)And then the inner "if" condition can be eliminated. Thus the final code could be:
This seems logical and easier to understand than the original too.Edit: if you want to get ternary-operator fancy:
[+] [-] jrockway|17 years ago|reply
Wrong. Completely ignoring all software engineering "best practices" was the problem. They could have reused a working library. They could have written unit tests tests. QA could have tested it. Other developers could have reviewed it, or written test cases without knowledge of how the code worked.
Basically, they ignored every rule for writing good code, and then ended up with a serious bug. Well, yeah. The whole "unit testing" "fad" exists for a reason -- it greatly improves the quality of software.
Many people take software development seriously enough to not let one brain fart cause a piece of software to brick thousands of devices.
[+] [-] Jebdm|17 years ago|reply
[+] [-] thras|17 years ago|reply
[+] [-] unknown|17 years ago|reply
[deleted]
[+] [-] samuel|17 years ago|reply
I wouldn't put all the blame on the programmer (who probably is now scared to death).
[+] [-] snprbob86|17 years ago|reply
Do you have the rights to this code?
Some context would be nice...
[+] [-] tlrobinson|17 years ago|reply
If it is indeed a Freescale-supplied driver issue, it makes me wonder how many other devices failed yesterday...
[+] [-] mooism2|17 years ago|reply
[+] [-] thomie|17 years ago|reply
Leap = (Year%400) ? 0 : 1;
[+] [-] arjungmenon|17 years ago|reply
[+] [-] ConradHex|17 years ago|reply
[+] [-] unknown|17 years ago|reply
[deleted]