A brief history of one line fixes

[+] chimeracoder|12 years ago|reply

> What do all these earlier mistakes have in common, apart from the obvious: being exemplars of “catastrophic loss of structural integrity”? They all date from before 2013. That’s how we know the NSA wasn’t involved.

I get that this is trying to make fun of the response to Apple's "goto fail;", but this logic ("These similar errors predate 2013 → Apple's similar error was not an NSA backdoor") seems rather faulty. There are a number of flaws with this line of reasoning. To name a few:

* The NSA could have been involved with backdoors before 2013 (unlikely in Debian, but mentioning 2013 is a bit of a red herring)

* Apple could have been encouraged to insert a backdoor and did so in a way that gave them plausible deniability (either because the NSA specified that, or because they wanted to cover their behinds)

Whether or not this incident was the result of the NSA's prompting is something we'll probably never know[0], but this article doesn't do much to argue one way or the other.

[0] The only way we could know is if someone literally came out and admitted it (or someone like Snowden were to leak it). It's possible to prove the existence of something (ie, a backdoor attempt), but impossible to prove the absence of something.

[+] lstamour|12 years ago|reply

My takeaway was that the NSA could have been involved in all the above, we just weren't paying as much attention before ;-)

Of course, whether the NSA was involved or not isn't the point of this article. The point is that high severity one-liner bugs have been made before.

I feel partly to blame. Until my comment on HackerNews revealing where the source code was, no one seemed willing to post more details about the bug (based on Twitter posts). But I think it will end well, as more people will pay attention to the code in future. Sadly, Apple has yet to share revised source code for the 10.9.2-shipping security library that I'm aware of.

[+] xenophanes|12 years ago|reply

What you've argued is faulty logic was actually sarcasm. One major clue is that the previous paragraph also contained heavy sarcasm.

[+] throwaway812|12 years ago|reply

The logic at the end is still in parody mode. Flawed is the point.

[+] yeukhon|12 years ago|reply

It's pointless to discuss whether this is an intentional flaw or a mistake because the problem lies in code review and tool. As someone has said in the past to me that compiler can catch these errors, well, if so, what flags do they use and should we have a standard list of flags to enable to test these problems? Do we have tools to show these errors instead of going through a 1000 pages compiled log?

[+] GuiA|12 years ago|reply

> impossible to prove the absence of something.

If p => q, then ¬q => ¬p. So if p is "The goto fail was setup by the NSA", then surely we can come up with a reasonable q which could be (dis)provable?

[+] unknown|12 years ago|reply

[deleted]

[+] unknown|12 years ago|reply

[deleted]

[+] fjarlq|12 years ago|reply

A favorite of mine, in C:

  ASSERT(apples = 1);
  VERIFY(oranges = 2);

(Note the '=' means assignment, not comparison.)

Both are assertion-testers resulting in a core dump if the test fails, but ASSERT is only defined for DEBUG builds.

Since the assignments are to nonzero (true) values, buggy values of 'apples' are silently corrected -- but only during test runs. Buggy values of 'oranges' are always silently corrected.

Hilarity ensued. Afterwards, 'gcc -Wparentheses' became mandatory.

[+] fooyc|12 years ago|reply

One way to avoid this kind of mistakes is to use Yoda conditions [0]:

    ASSERTS(1 == apples);

If you forget one `=`, it doesn't compile.

[0]: https://en.wikipedia.org/wiki/Yoda_conditions

[+] CUViper|12 years ago|reply

Coverity has ASSERT_SIDE_EFFECT for exactly this sort of case.

[+] mindslight|12 years ago|reply

Cough, there's one major thing all of these bugs have in common - they're all in C code.

This is mostly due to the volume and prominence of C. But a language with the same fundamental semantics of C but lacking the cleverbait syntax and weakened types would have prevented over half of these mistakes.

[+] stusmall|12 years ago|reply

I would be more willing to blame C if these bugs were unsafe memory operations. Unused parameters, logic errors and assignment instead of comparison aren't unique to C and the language isn't without counter measures for them.

The X one is probably the only very C bug on the list and compiler warnings and/or a good static code checker would have found it.

[+] philh|12 years ago|reply

"Over half" is dubious.

The compiler could have prevented 'X' by not permitting "function == int" comparisons. That's 1/5.

The only thing you might be referring to with "cleverbait syntax" is the ++ fix in tarsnap, but note that the bug was not including ++. Being generous, we could say that if ++ wasn't allowed, it might have been more obvious that the code was broken, getting us 2/5. And being extra generous, maybe if `!i` was disallowed, they would have gone straight for `i <= 0` instead of `i != 0`. So 3/5, but yeah, dubious.

The only other one that's remotely like a C misfeature is that the compiler apparently didn't warn about 'unused parameter' in the Android bug.

[+] dror|12 years ago|reply

And C has this thing called lint that's only been around since 1979 that can detect many of these issues.

[+] asveikau|12 years ago|reply

I heard people working in those other languages never make mistakes.

[+] unknown|12 years ago|reply

[deleted]

[+] chetanahuja|12 years ago|reply

Yup. We found an exhaustive list of all ze software bugs and ... cough, cough, gasp, gasp... all happen to be in C. Whatever shall we do?

[+] kcbanner|12 years ago|reply

C is the least of the problems here.

[+] mpetrov|12 years ago|reply

The commentary on last example from Tarsnap seems wrong. The error was that the nonce stopped being incremented (see the linked Tarsnap blog post), but the author suggests the issue there is the unbraced if. The unbraced if is still not great style, but it wasn't the cause of the security blunder.

[+] clarry|12 years ago|reply

It is kind of on purpose, that's the whole point of the writeup. It's taking a little jab at everyone who proclaims a "simple and obvious" way any given error could've been avoided. Fact is, making the programmer type more (braces or other stuff) doesn't save him from himself; the goto fail error could've been made with braces just as well as without.

The other examples are make the same point. There are ways to help catch specific bugs. But there is no magic "fix it" button. Consider static analysis for example: you actually have to run it and inspect the output. You have to interpret it right. You have to fix it right. And so on. Human error can ruin each step, as has happened.

[+] wging|12 years ago|reply

The commentary does fit the recent Apple 'goto fail' bug. I think this post may've originally included it. Perhaps the wrong commentary got deleted.

[+] mst|12 years ago|reply

Yes. That was the joke.

[+] yeukhon|12 years ago|reply

I have done similar thing to myself in Python, sabotaging my own little project. The following is not a rare thing. Imagine this were initializing random key in TLS... But since Python syntax is arguably cleaner and more readable, it's easier to catch. But that's a bet we have to put up with...

    >>> def f(a,b=[]):
    ...     b.append(a)
    ...     return b
    ...
    >>> print f(1)
    [1]
    >>> print f(1)
    [1, 1]
    >>> print f(1)
    [1, 1, 1]


    >>> import random
    >>> def key():
    ...     return random.randint(1,10000)
    ...

    >>> def f2(a, b=key):
    ...     return b()
    ...
    >>> def f3(a, b=key()):
    ....    return b
    >>> f2(1)
    3974
    >>> f2(1)
    8684
    >>> f3(1)
    2867
    >>> f3(1)
    2867

[+] _pmf_|12 years ago|reply

> How is this possible? Does nobody use a compiler that warns about comparisons always being false?

External symbols are resolved at link time; as far as the compiler is concerned, setuid might be a symbol declared as __weak.

I don't particularly like the snide remarks towards the maintainers of these libraries.

[+] claudius|12 years ago|reply

Regarding the Debian OpenSSL disaster, the question was brought up on the OpenSSL mailing list (and read there); regarding the Tarsnap bug, it is not an unbraced if but a non-incremented nonce. If two of the five comments in an article are obviously wrong, how much trust should I possibly put into the remaining 60%?

[+] jodrellblank|12 years ago|reply

Another thing they have in common is trying to reason about security of SSL certificate chains and other high abstraction concepts in terms of C pointers and function return values.

[+] IvyMike|12 years ago|reply

Maybe that will be my new way to measure the effectiveness of my test suite.

"Imagine the NSA snuck a one line 'fix' into your software overnight. Do your tests quickly and accurately detect the problem and point to the code that is broken? If not, your unit tests are broken."

[+] MBlume|12 years ago|reply

Lint tools exist.

My number one take away from this is that we should all be using them more.

[+] azernik|12 years ago|reply

And ignoring compiler warnings less.

[+] adamlj|12 years ago|reply

"This is an important lesson to learn: Mistakes can happen any time a piece of code is modified."

From http://www.daemonology.net/blog/2011-01-18-tarsnap-critical-... regarding the Tarsnap bug.

Solution: write more unit tests.

[+] peterhunt|12 years ago|reply

Or use better abstractions / static analysis.

[+] nemothekid|12 years ago|reply

A lot of arguments center around "well, the compiler should have done X!"

I don't write a lot of C code, but when I did, and even more relevant - when I compile packages, the compiler tends to spit out chapters worth of warnings that I just ignore. How accepted is compiling C code in practice with Werror? Could it be the compiler did warn about goto fail, and the OP's error, but it was 1 warning out of a million? Really interested in an answer from those who work in C shops.

[+] pm215|12 years ago|reply

The project I work on (QEMU) uses -Werror in development but defaults it to off for releases. This means that warnings don't sneak into the codebase but end users don't have to deal with unnecessary build failures just because they're using a newer and pickier compiler or oddball host system.

Putting in the effort to get down to "emit no warnings" (and then enforcing that you stay there with -Werror) is worth doing exactly because you then do get new problems drawn forcibly to your attention, IMHO.

[+] Gracana|12 years ago|reply

I write my code with -Wall, -Werror, -pedantic, and a standard specified. If my code does something questionable, I need to instruct the compiler that I really mean for it to do that. Why tolerate unsafe and ambiguous behavior?

[+] acomjean|12 years ago|reply

In my past experience writing and compiling c and c++ on Radar software: The project was big and fair amount of legacy code, there were a fair amount of warnings in the c code. A lot of code was in Ada, which I grew to really appreciate, but for some things we needed code in c.

I started work on a new from the ground up simulator in c++. The 12+ coders got used to the warning = errors (Werror). I think it made us better coders. We were able to do that because the project was from the ground up new. Most projects in the Radar field use a lot of previously tested legacy code that does issue warnings.

[+] LukeShu|12 years ago|reply

In the case of Xorg, if they had warnings enabled, it was one warning among pages and pages of them. Recently though, they cleaned it up, 1.16 and onward should compile without warnings[0].

[0] http://keithp.com/blogs/xserver-warnings/

[+] GFK_of_xmaspast|12 years ago|reply

It's probably putting out warnings because you're being sloppy. Maybe that's ok, maybe it's not, but cleaning up warnings is like washing your hands on the way out of the restroom, it's good hygiene.

[+] Rygu|12 years ago|reply

1. Start writing tests. Tests will save your ass when you're merging a feature branch. Cause are you sure everything still behaves correctly after those ingenious refactorings?

2. Use an IDE and/or compiler that actually warns you about common human errors like unused variables, unreachable statements, and dumb mistakes like a case-block with silent fallthrough to the next case-block.

3. Acknowledge responsibility when writing security related code. Know that there's no room for mistakes.

[+] matt__rose|12 years ago|reply

with the X bug, building X has always triggered tons of warnings, it's easy to see why they're overlooked http://keithp.com/blogs/xserver-warnings/

[+] julie1|12 years ago|reply

openBSD credo: security bugs are just bugs.

Mostly human, and avoidable but always present because people don't take the time to do it correctly. Quality, taking the time to make the things right, slowly, correctly, all that matters more than genius. Quality code at all level is the key to secure development. Every little things count.

I know a few openBSD developers, and their song about doing the right things, taking our times to do it correctly - simpler, with quality because it is the shortest path to correctness thus speed (what is the use of an incorrect program: none). This song is appealing. The fact they are strangely psychopathic paranoid nerds is not ;)

I think computer industry would be far better if we would slow down and focus on doing less software, but better built.

Less functionalities maybe, but the one that maybe more powerful. The stable foundation for sustainable progress.

And, if you like them -even though they are repulsing antipathic nerds- you can support them :)

[+] nayden|12 years ago|reply

careful there. we're not all the same -- some are MUCH more paranoid than others and I am yet to meet an antipathetic developer. ;)

But thanks to those who support what we do!

[+] chetanahuja|12 years ago|reply

To those learning C programming "by reading other's code", following is an example of what not to do (either the subtracted or the added line).

[Now I wait for a deluge or replies about how it's "idiomatic" C. All I have to say to that is, s/ma// ]

  --- libc-a/memset.c
  +++ libc-b/memset.c
  @@ -1,6 +1,6 @@
  void *memset(void *_p, unsigned v, unsigned count)
   { 
       unsigned char *p = _p;
  -    while(count-- > 0) *p++ = 0;
  +    while(count-- > 0) *p++ = v;
       return _p;
   }

[Edited later to use the code mode]

[+] jleader|12 years ago|reply

I was going to complain about the 4 errors in 4 lines of code, until I realized that HN converts asterisk, text, asterisk into italicized text.

I assume you wanted your code to look like this:

    void *memset(void *_p, unsigned v, unsigned count) {
        unsigned char *p = _p;

    -   while(count-- > 0) *p++ = 0;
    +   while(count-- > 0) *p++ = v;

        return _p;
    }

In which case, I'm not sure what you're complaining about. Is it that memset here has a non-standard signature (v is supposed to be int, not unsigned)? Or that the "before" code ignored v? Or the un-braced loop body on the same line? Or is the some other problem I'm overlooking (which is possible, it's been about a decade since C was my main day-to-day language)?

[Edited to try to coax better text formatting from HN]

[+] jsoffer|12 years ago|reply

I think it's ok. I was going to say "He should have used memset instead of this low level memory walking that belongs inside the implementation of... oh I see". This is a critical function; doing it the proper "machine-workings, helping the compiler" way is more important than having a first read immediate understanding. Not that an experienced C programmer won't have it anyway.

[+] EdwardDiego|12 years ago|reply

What is particularly bad? How would you have done it differently? I've not done much C, so I can't see the issue, although is it perhaps the types? Would count be modified by side-effect in the calling context also?

[+] chris_wot|12 years ago|reply

Actually, with the Debian SSL "fix" that removed entropy, the dev who made the change did ask about it on IRC. Though a bad mistake, it's understandable how it happened.

[+] tbmatuka|12 years ago|reply

This one may not be as bad as the ones in the article, but it still gave us a few weeks of work upgrading kernels on a couple of thousand production servers: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g...

[+] ybaumes|12 years ago|reply

In the example of the Apple bug, I've read it took eighteen months (!!) to discover it after it has been committed in the source code base. As a consequence, I would be interested in knowing, for each bug, the time it took to discover them.

Also, I assume that once discovered, the time to fix it is usually negligible. Usually. And as long as you don't count the time for full deployment in production...

[+] rjzzleep|12 years ago|reply

i'm actually a little surprised about the openssl one. I always assumed it would be good practice to number error codes negatively. OTOH that's also kinda expected behavior for reverse engineers. if the return value is negative you can be almost certain it's an error code

i guess whoever wrote that line thought the same.

154 comments