Systemd is bad parsing

[+] ambrop7|7 years ago|reply

Yes the code is bad but does not need to be completely rewritten. It should interpret the data as an array uint8_t (no casting to structure/integer pointers!), use simple helper functions to read out (u)int(8/16/32) values (there was a post on HN about this recently, https://commandcenter.blogspot.com/2012/04/byte-order-fallac...), and be careful to check things like sizes.

The code is also wrong because of strict aliasing. This is a real problem, your program can in fact exhibit undefined behavior because of this (it happened to me).

Some time ago I wrote some code to make these things simpler in C++. It allows you to define "structures" (which however are not real structs, it's an abstraction) and then directly access an arbitrary buffer through setters/getters. The code is here with documentation: https://github.com/ambrop72/aipstack/blob/master/src/aipstac... . In fact this is part of my own TCP/IP stack and it includes DHCP code (here are my DHCP structure definitions: https://github.com/ambrop72/aipstack/blob/master/src/aipstac...).

[+] moefh|7 years ago|reply

Interestingly (and pedantically), the code from the otherwise excellent "Byte order fallacy" article technically contains undefined behavior in C on machines where int is 32 bits wide (which is almost everything nowadays).

Consider this example of extracting a 32-bit int encoded as little endian:

    #include <stdio.h>

    int main(void)
    {
      unsigned char data[4] = { 0, 0, 0, 128 };
      unsigned int i;

      // the line of code from the article:
      i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | (data[3]<<24);
    
      printf("%x\n", i);
    }

Compiling this with either gcc or clang using "-fsanitize=undefined" triggers this runtime error:

    test.c:9:61: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'

That happens because, even though the bytes are unsigned as the article requires (using uint8_t instead of unsigned char has the same problem), they are promoted to (signed) int before the left shift as part of automatic integer promotion. When the left shift by 24 happens, it puts an 1 into the sign bit of the int, which is undefined behavior. Later the int is converted to unsigned (using implementation-defined behavior, which luckily is sane on all popular compilers), but by then it's already too late.

The solution is to change the code slightly:

   i = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) | ((unsigned)data[3]<<24);

This prevents the promotion from unsigned char (or uint8_t) to int by explicitly casting it to an unsigned int, which can then be safely shifted left by 24.

[+] camgunz|7 years ago|reply

Yeah casting to struct pointers is a bright red flag. It also rarely saves you anything since packed structs are slow, and you still have to validate the contents.

You're right, these streams should be interpreted as uint8_t to avoid undefined behavior when shifting, masking, and ORing, and anyone doing this should tuck it behind some abstraction layer. It should never bleed into parsing.

[+] jstimpfle|7 years ago|reply

As an array of uint8_t? I'm pretty sure that's wrong, and that it should be char instead. Usual OS network APIs are char based, and the fact that the network is technically about "octets" shouldn't matter behind those APIs. Or is my thinking wrong?

[+] ben0x539|7 years ago|reply

I don't think I disagree, but could you articulate how strict aliasing makes this code wrong? Maybe I'm not looking at the right code, but it's not immediately obvious from the post to me.

[+] microcolonel|7 years ago|reply

> The byte order of the computer doesn't matter much at all except to compiler writers and the like

This statement is to generic to mean anything helpful.

I think the point our friend Rob Pike is making is that you could write general purpose code which accomplishes the task, and on a sufficiently advanced compiler, the swap may be eliminated where applicable; but when the point is made this way, it surely just pisses off people who know better.

C programmers are not (quite) children, you don't need to eradicate nuance to keep people reading.

[+] jmartinpetersen|7 years ago|reply

The article isn't really about the bug but about the (lack of) quality in the code. Spoiler alert: He doesn't like it.

"This code has no redeeming features. It must be thrown away and rewritten yet again. This time by an experienced programmer who know what error codes mean, how to use asserts properly, and most of all, who has experience at network programming."

[+] tinus_hn|7 years ago|reply

If the rest of the code looks anywhere near similar to the examples he shows he’s right, this is terrible code and it should never have been allowed in a serious project in this day and age.

[+] someone454|7 years ago|reply

[deleted]

[+] flohofwoe|7 years ago|reply

I agree with most of the critique, but I don't quite get the part about the use of assert.

The code in question quite clearly doesn't use C's standard assert macro, but a custom macro called assert_return (which I guess checks for a condition, and returns from the function instead of aborting the program).

So basically:

    if (!condition) {
        return err;
    }

How else would one check for invalid input to the function?

[+] bjourne|7 years ago|reply

I don't know the details of DHCP, but there is a difference between values supplied by the programmer and those supplied by the user. According to the author, checking if a pointer is not-null is checking a programmer error (EINVAL) so it makes sense to use asserts. But the optval string in the example is supplied by the user so it is insane to use asserts to check for null-termination.

Asserts should only be used to signal "this code is broken" but that is not how they are used here.

A long while ago the GTK+ library used to suffer from a similar affliction. Mainstay GTK+ programs such as GEdit and Totem would print lots of warnings on the console. Since the GTK+ developers used the same facility (g_warning) for reporting both coding errors and for user errors like "file not found," debugging code was hard. Was the warning "supposed to be there" or was it a bug? GTK+ programs are much cleaner these days even if some programs still fails to make the distinction between user and programmer errors.

[+] viraptor|7 years ago|reply

I may have a different understanding than the author, but for me the difference in the name means a lot. The fragment you wrote is clear and simple. But if I see assert_..., then I immediately have questions like: Where's the message printed and does it have enough context for debugging at that point? Is this affected by NDEBUG? Will it run in release builds? Does assert_se() kill the whole application?

Assert already has a meaning and using it in this context is not great.

[+] XorNot|7 years ago|reply

I can see the argument to be made about overloading the word "assert" in the language though.

i.e. "assert" meaning "check something in this code is algorithmically correct"

Whereas when you're checking user input, use a very different term - i.e. validate or something.

So a fix would be to rename the macro to be obvious about it's usage - "validate_return" or "valid_or_return" or something.

To be fair I can see the problem because I do tend to write code comments saying "assert everything makes sense" when checking user input sometimes and "assert" itself doesn't properly convey the outcome if it fails - i.e. "abort_if" or "crash_if" would be a much more descriptive term.

[+] corndoge|7 years ago|reply

Probably returns a value if the assert works, and aborts otherwise.

I would go so far as to say that using asserts in input validation is a good thing, in very limited quantities. I use them in places where the input should have been validated so long ago, so many functions back, that if something isn't as I expect here, I seriously fucked up somewhere else and the whole system has a good likelihood of being compromised by something unrelated to the condition I'm asserting. In that scenario I'd rather log and abort.

[+] FeepingCreature|7 years ago|reply

In D, we call the two functions "assert" and "enforce". The point is that assert is the wrong term; to say "assert(condition)" says "I assert that condition should hold here". If there's an input event that causes condition to be false then your assert is not just false but wrong, since you assert something that is not necessarily true.

[+] stinos|7 years ago|reply

So this is a whole rant, and it's pretty much spot on probably, but the reader is left with a huge elephant of a question in the room: what is the proper way to do it, then? It has samples all over the place of what's bad and telling thhose should be rewritten. Why not show some C/C++ code samples of the proper way to do it, then?

[+] asdfasgasdgasdg|7 years ago|reply

> Why not show some C/C++ code samples of the proper way to do it, then?

He provides constructive examples for all of the items that he criticizes.

- Instead of pointer arithmetic, base_ptr, offset, len.

- Instead of casting to internal structs, parsing into internal structs.

- Instead of network byte order, buf[offset] + buf[offset+1] * 256 .

- Instead of the wrong error codes, the right ones.

- Instead of random asserts over internal data structures, validation at parse time.

[+] camgunz|7 years ago|reply

The main issue to me is that conversion, parsing, and validation are scattered throughout the code when they should be in separate, local, sequential steps. You should never be in a situation where you're halfway through allocating memory for an internal structure when suddenly you realize your input is malformed, for three reasons:

- In order to make a good decision about what to do next, you need a lot of context: what you need to clean up, what any calling function expects you to do, any internal state you need to update, etc. You either end up passing a lot of extraneous context info around, or you make everything global.

- Alternatively you bail with a status code and mutter "yolo" to yourself, or you throw an assert -- but crashing a system daemon with input is the essence of a DoS, so only the truly ignorant or lazy resort to it.

- Building a reasonable framework is hard when all your concerns are entwined like this. If at some point you decide asserts on input are bad, it's very hard to change that convention in this code. Essentially all your function signatures and your entire call graph change. There's an extraordinary amount of technical debt here; just look at all the places Lennart had to change the size/offset stuff. The logic for this is scattered all over the place.

People have focused on some of the tactical issues -- asserts on input, byte order conversion, etc. -- but the real issue is that when you write code in this style those things don't jump out at you because you have to do them everywhere. If your code had a better separation of concerns, as soon as you saw 'offsetof' in a parsing function you'd immediately see a red flag. It's so much easier to think in these specific contexts, where you're assured previous steps are complete, rather than in every single function having to get the entire conversion/validation/parsing flow correct every single time, keeping the entire thing in your head always.

[+] romed|7 years ago|reply

I'm kinda bothered by the "C/C++" indistinction of your request. This code would look very, very different in C++.

[+] mackal|7 years ago|reply

Blog makes an error in their assumption of what assert_return() is ... It's not a macro around assert() or related.

Possibly a bad idea on systemd's part to reuse the name assert, but the code isn't doing what the blog assumes it does.

[+] chris_wot|7 years ago|reply

As an aside, I find it a little ironic that I found it hard to parse the headline "Systemd is bad parsing". No, Systemd is not bad parsing, Systemd is parsing badly.

[+] hyperpape|7 years ago|reply

The original is “Systemd is bad parsing and should feel bad”, which sounds a little weird, but it’s a meme, so there are lower standards.

I guess someone felt like it shouldn’t be a meme headline on HN, but the result is nonsense.

[+] philipwhiuk|7 years ago|reply

I assume it should have read "Systemd is bad at parsing"

[+] polotics|7 years ago|reply

So it's true what some-of-you'all been saying all along about systemd!

;-D

One question: any smell of intentional obfuscation for this vulnerability, or is it impossible to tell?

[+] dsr_|7 years ago|reply

Here's a thing to ponder:

If this were not buried in systemd, but was part of a standalone dhcpd package, would it have been done better or caught earlier?

One of the major criticisms of systemd is that it wants to take over all system management functions. In order to do that it needs to offer at least minimally functional versions of all the services it pushes out. It should not be news to anyone that trying to do everything instead of integrating with existing projects leads to sloppiness.

This incident is evidence suggesting that other serious issues are lurking in systemd as a result of the actions naturally arising from their policy.

[+] blakesterz|7 years ago|reply

>> One question: any smell of intentional obfuscation for this vulnerability, or is it impossible to tell?

I always think that anytime there's an interesting bug in ANYTHING. I'm not an expert, but I would think it would be nearly impossible to tell the difference between "someone made a mistake" and "someone made this look like a mistake". I would think any decent programmer with malicious intent could do that anytime they wanted.

[+] eeZah7Ux|7 years ago|reply

> In the late 1990s and early 2000s, we learned that parsing input is a problem.

What? It was known well in advance.

[+] rphlx|7 years ago|reply

For some further, semi-amusing support for the title's assertion, save all of your work, close important programs, and then run 'sudo systemctl --dry-run reboot'

[+] zwkrt|7 years ago|reply

Other people's gripes aside, I enjoy that the authors approach to avoiding out-of-bounds errors almost exactly mirrors Java's nio Buffer interface.

[+] stmw|7 years ago|reply

Article declares that "What you should do instead is parse data the same way as if you were writing code in JavaScript," followed by "pointer arithmetic is wrong" (paraphrasing). That seems a bit extreme...

[+] MichaelMoser123|7 years ago|reply

gcc is running in -fno-permissive by default, it will give you a warning if you implicitly try to converted between pointers of different types. (turn on warning as errors -Werror so that you have to deal with these messages)

Now what I don't understand is that for C you are not allowed to disable this, -fpermissive is not allowed for C, whereas for C++ you can. Anyone knows the reason why? The compiler error says "command line option "-fpermissive" is valid for C++/ObjC++ but not for C" - why is the compiler having this distinction?

[+] exabrial|7 years ago|reply

I typically disable ipv6 unless I specifically plan to offer/use it as a public service. Just more things to keep an eye on for no benefit to the actual business.

[+] dvfjsdhgfv|7 years ago|reply

Just like me and many other sysadmins. What amazes me is that the people who designed IPv6 still don't get it and seem surprised why it hasn't took over the Internet yet.

[+] unknown|7 years ago|reply

[deleted]

[+] verytrivial|7 years ago|reply

This article started out really well, saying you shouldn't e.g. byte-swap as part of the overall task at hand, then wanders into comparisons with other languages. I had hoped it was instead going to say that you should use some verified parsing tool to unpack data. All the other languages, at some level, will byte swap! It is the choice of the level of abstraction (or lack thereof) which is the systemd bug here, not the choice of a C family language IMNSHO.

[+] IshKebab|7 years ago|reply

This is the first reasonable criticism of systemd I've heard. That is very bad code.

[+] bvinc|7 years ago|reply

He talks like the way he writes code has been established as best practice for decades, and if he can just write enough blog posts and convince enough people, maybe we can solve this problem once and for all.

I think the only way to have high assurances is to program under a system that doesn't have these classes of errors. I don't care if it's a c static checker or a new language or what.

Unfortunately the long term solution to these problems in system programming seems to be rust. But it's going to take decades for it to be established and I'm going to look like a language hipster member of the rust evangelism strike force every time I bring it up.

But seriously, it's been decades and decades and we're still writing buffer overflows. What is the long term final nail in the coffin for buffer overflows?

[+] Go0the0gophers|7 years ago|reply

Always fuzz your parser! (e.g. AFL, go fuzz)

[+] xtrapolate|7 years ago|reply

> "The other thing that you shouldn't do, even though C/C++ allows it, is pointer arithmetic. Again, it's one of those epiphany things C programmers remember from their early days. It's something they just couldn't grasp until one day they did, and then they fell in love with it. Except it's bad. The reason you struggled to grok it is because it's stupid and you shouldn't be using it. No other language has it, because it's bad."

Honestly, this is a pointless rant. "Do this, but don't do that (because that's stupid)".

> "No other language has it, because it's bad"

The reason you "don't have pointer arithmetic in other languages" is - it's been abstracted away from you, for your convenience, so that you can just concentrate on your business logic and not have to worry about the rest. You may not deal with pointer arithmetic, but the people developing your JVM/CPython/V8/Chakra sure do. Author is simply ranting about the possibility of abusing/misusing the mechanisms offered by C/C++.

[+] zwkrt|7 years ago|reply

Programming as a profession is strange because we each do such different things but all tools are available to us all the time. Imagine if this were true in the physical world; you would read arcticles written by accountants like "Dynamite Considered Harmful" with rebuttals titled "Why Dynamite Matters" by construction engineers...

[+] thwd|7 years ago|reply

Pointer arithmetic is abstracted away for _safety_, not convenience. You can read about in e.g. Go's FAQ.

https://stackoverflow.com/questions/32700999/pointer-arithme...

[+] ben0x539|7 years ago|reply

Do you just take issue with the specific claim that no language should have pointer arithmetic, or do you disagree with the more general thrust of the post?

Sure, it's ranty and, like, maybe closer to op-ed than computer science publication, but I think even among the rarefied breed of people developing your JVM/CPython/V8/Chakra, you could easily find approval for the notion that pointer arithmetic is bad(*) and should be avoided as much as possible.

[+] vmchale|7 years ago|reply

Funnily enough, ATS has pointer arithmetic and it can be safe if you declare the appropriate types.

It seems kind of nuts to say array indexing isn't just a fancier version of pointer arithmetic.

[+] gaius|7 years ago|reply

Additionally, I don’t think it’s reasonable to extrapolate from the mess that is systemd to all C/C++ code. Some fantastically reliable systems have been developed with them.

[+] a3_nm|7 years ago|reply

If like me you are looking for the CVE and not for commentary about the affected code, here it is: https://nvd.nist.gov/vuln/detail/CVE-2018-15688

FWIW, from https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912008 it looks like the vulnerable code is not enabled by default on Debian.

[+] JdeBP|7 years ago|reply

The rather less strongly titled "Systemd-networkd memory corruption" pointing to the much more detailed Ubuntu bug has also been on Hacker News for a day.

* https://news.ycombinator.com/item?id=18303613

183 comments