top | item 416875

Source code of the file with the Zune bug (starts on line 249)

58 points| vaksel | 17 years ago |pastie.org | reply

33 comments

order
[+] tlrobinson|17 years ago|reply
Took me a minute to figure out what was wrong. If IsLeapYear() returns true, and "days" equals 366, you get stuck in an infinite loop (you don't modify "days" at all).

This was the first leap year since Zune was released, and yesterday was the 366th day of the year, sooooo....

Also spotted in this code: several "goto"s.

[+] thwarted|17 years ago|reply
Normally, gotos wouldn't _necessarily_ be bad, but at least one of these is really really bad.

Consider the function OALIoCtlHalInitRTC, lines 122 and 139-143. The last else block has no obvious terminator, but it looks like the else is empty because of the comment and the indentation. So the line after the label cleanUp is the body of the else. This means that if the if on line 131 is true, the cleanUp code never gets executed. Now this may be intentional (but I doubt it since the label is named cleanUp), but that should be documented in the code.

I consider this use of a goto to be legitimate, since it avoids duplication of the cleanup code (and the last thing you want is to have multiple similar blocks of code that need to be maintained/modified in tandem). This idiom often appears in the linux kernel IIRC. But this code is quite terrible since it doesn't use the idiom correctly, produces an unexpected execution path, is a problem to maintain, and is a problem to read. It seems that it ends up being a no-op since the line after the cleanUp label is a debugging line or something, but that doesn't make this a good practice.

But really, this isn't really a problem with the use of goto, it's a problem with the optional braces on single statement blocks in C. The goto and the label chosen just make it harder to spot. I've made it a habit to use braces around all blocks, even empty ones, to make the intent clearer and avoid this problem.

[+] noahlt|17 years ago|reply
A lesson in code smell and good style: I took one look at the code and (before noticing the bug) thought, “why is there an inner if instead of just using a single conditional with a logical AND?”

That is to say, I could have fixed the bug without knowing what it was. (I'm a hs student and don't do much bug-shooting in code that wasn't written by me or my close friends, so I don't know if this is common.)

[+] dfranke|17 years ago|reply
Last I checked, there haven't been any major overhauls to the Gregorian calendar since C was invented. So why aren't we all using the same, fully-debugged date library yet?

Edit: Actually, I'm cutting it close :-). The last time we changed the calendar was in 1972, when we settled on integer leap seconds -- the same year that C was invented.

[+] projectileboy|17 years ago|reply
Hear, hear. It's strange how often we have to get bitten by date and time bugs. I think people ignore the wisdom of Peter van der Linden in the excellent book "Expert C Programming":

"Anyone who thinks programming dates and times is easy probably hasn't done much of it."

[+] chime|17 years ago|reply
> Line #263: if (days > 366)

This should work if that was a >= sign. And people at work don't believe me when I say how insanely careful I have to be when I write any code, especially code that deals with people's payroll and vacation time. One missing equals symbol and every Zune in the world froze. One missing equals symbol and every employee could lose 1/12 of their accumulated vacation time.

The ability to pay attention to "mundane details" is just as important as the ability to envision the whole software application from the project manager point-of-view, especially when working in smaller companies without additional oversight.

[+] tlrobinson|17 years ago|reply
Making that ">=" would make that entire "if" check redundant, since we already know it's at least 365 (because of the "while" condition)

I think the "while" condition should actually be

    while ((days > 365 && !IsLeapYear(year)) || (days > 366 && IsLeapYear(year)))
(or perhaps some simplification that I'm too lazy to figure out right now)

And then the inner "if" condition can be eliminated. Thus the final code could be:

    while ((days > 365 && !IsLeapYear(year)) || (days > 366 && IsLeapYear(year)))
    {
        if (IsLeapYear(year))
            days -= 366;
        else
            days -= 365;
        year += 1;
    }
This seems logical and easier to understand than the original too.

Edit: if you want to get ternary-operator fancy:

    while (days > (IsLeapYear(year) ? 366 : 365))
    {
        days -= IsLeapYear(year) ? 366 : 365;
        year += 1;
    }
[+] jrockway|17 years ago|reply
> One missing equals symbol and every Zune in the world froze.

Wrong. Completely ignoring all software engineering "best practices" was the problem. They could have reused a working library. They could have written unit tests tests. QA could have tested it. Other developers could have reviewed it, or written test cases without knowledge of how the code worked.

Basically, they ignored every rule for writing good code, and then ended up with a serious bug. Well, yeah. The whole "unit testing" "fad" exists for a reason -- it greatly improves the quality of software.

Many people take software development seriously enough to not let one brain fart cause a piece of software to brick thousands of devices.

[+] Jebdm|17 years ago|reply
Really, this looks like a rather silly way to put this function together. It doesn't even run in constant time! It might be a little complicated, due to the leap year rules, but it would probably not be too difficult to do this calculation without any loops.
[+] thras|17 years ago|reply
No. That's incorrect. That would cause the year to go up on Dec. 31st (day 366), turning it into day 0. In fact, what you want is to change the while to an if statement. If it really does need to run as a while statement (to add more than one year at once?), then the alternative would be a break statement after line 271.
[+] samuel|17 years ago|reply
Someone at QA it's going to be fired. Corner cases like 1st and last days of a year(leap and not), 29th of February and the like are the first candidates for proper blackbox testing.

I wouldn't put all the blame on the programmer (who probably is now scared to death).

[+] snprbob86|17 years ago|reply
Source? (journalism, not code)

Do you have the rights to this code?

Some context would be nice...

[+] tlrobinson|17 years ago|reply
It's allegedly from a Freescale clock driver (see the comments at the top). I'm trying to find the original on Freescale's site, unsuccessfully thus far.

If it is indeed a Freescale-supplied driver issue, it makes me wonder how many other devices failed yesterday...

[+] mooism2|17 years ago|reply
I find myself wanting to see the version history for that function. Is that how it was originally written? Was the bug introduced by a previous bug-fix? By a refactoring? Something else?
[+] thomie|17 years ago|reply
And another leapyear bug on line 170? The year 2000 was a leap year, but here they seem to have the 0 and the 1 mixed up:

Leap = (Year%400) ? 0 : 1;

[+] arjungmenon|17 years ago|reply
How did these guys (pastie.org) get hold of he Zune source code? I don't think Microsoft gives away any of their source to the public (except for some programs covered by MS's shared source license).
[+] ConradHex|17 years ago|reply
Freescale gives away this source for free. It's driver source for some of their hardware.