I Do Not Know C: Short quiz on undefined behavior (2015)

[+] aidanhs|9 years ago|reply

My 'favourite' bit of surprising (not undefined) behaviour I've seen recently in the C11 spec is around infinite loops, where

void foo() { while (1) {} }

will loop forever, but

void foo(int i) { while (i) {} }

is permitted to terminate...even if i is 1:

> An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate

To make things a bit worse, llvm can incorrectly both of the above terminate - https://bugs.llvm.org//show_bug.cgi?id=965.

[+] pcvarmint|9 years ago|reply

It means that empty loops (loops with empty bodies) can be completely removed if the controlling expression has no side effects.

> This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

It means while(i) {} can be eliminated as if i were 0, because there are no side effects in the loop expression or the loop body, and what would be the point of the loop if it never terminated on a non-constant expression?

As an optimization, the optimizer is allowed to eliminate it as a useless loop with no side effects. If you really want an infinite loop, you can use while (1) {}.

There are cases where automatically generated C code might have empty loops which are useless.

If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles.

[+] fmap|9 years ago|reply

This definition is actually required for the correctness of many standard compiler optimizations such as partial redundancy elimination and code motion.

[+] unknown|9 years ago|reply

[deleted]

[+] adamnemecek|9 years ago|reply

What's the point of this?

[+] unknown|9 years ago|reply

[deleted]

[+] DSMan195276|9 years ago|reply

I'll be honest, I didn't find any of these to be particularly surprising. If you've been using C and are familiar with strict-aliasing and common UB issues I wouldn't expect any of these questions to seriously trip you up. Number 2 is probably the one most people are unlikely to guess, but that example has also been beaten to death so much since it started happening that I think lots of people (Or at least, the people likely to read this) have already seen it before.

I'd also add that there are ways to 'get around' some of these issues if necessary - for example, gcc has a flag for disabling strict-aliasing, and a flag for 2's complement signed-integer wrapping.

[+] mjevans|9 years ago|reply

I don't think #2 has been fully beaten to death yet.

Assuming a platform where you don't segfault (say that 'page 0' variables are valid) and thus runtime does proceed; I still can't think of any /valid/ reason to eliminate the if that follows (focus line 2 in the comments).

Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?

In my opinion that is an, often working but, incorrect optimization.

[+] kelnos|9 years ago|reply

Yeah, I agree. I used to write C full time, but haven't in around 6 years, and I only flubbed #11 & #12 (I knew there was undefined behavior but couldn't remember why; after reading the answers I was like "duh", esp for #12 after having read #11).

I've never actually run into #2 in practice, though: even at -O3 the dereference in line 1 has always crashed for me, though I guess probably because I've never written code for an OS where an address of 0 is valid and doesn't cause a SIGSEGV or similar.

What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?

[+] the_cap_theorem|9 years ago|reply

Yes, like most of the "undefined behaviour allows your computer to format the disk"-style posts this one seems to be written by a programmer with novice-intermediate C knowledge.

What irks me is the intro >> The purpose of this article is to make everyone (especially C programmers) say: “I do not know C”. <<

I think the purpose of the article was mainly for the author to write down some things he learned. Apparently it was his expectation that readers wouldn't be able to answer the quiz.

However, if you can't answer (at least most) of these questions correctly you're _not_ an expert c programmer.

So I think the correct intro here should be "The purpose of this blog post is to to show that if you want to learn C, you actually have to learn it and should not attempt to 'wing it'".

...and maybe also that you should not write patronizing blog posts about a topic which you haven't fully grasped yet yourself.

[+] rwj|9 years ago|reply

Not a full-time C programmer, and I was still correct on all of them except #1. Certainly C is more dangerous than other languages, but I don't understand the push to convince people that it is impossible to understand.

[+] deathanatos|9 years ago|reply

I don't think most C programmers share your depth of the language. I tried hard to explain strict aliasing once, and utterly failed. The dev was convinced that he knew the exact behavior of the platform, and that it was fine. Yet people constantly find examples where we "know" what the compiler will do, and it does something completely different.

[+] junk_disposal|9 years ago|reply

Honestly, Optimizing compilers will kill C.

It killed the one thing C was good at - simplicity (you know exactly what happens where, note I'm not saying speed, as C++ can be quite a bit faster than C).

Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!). And if you slip somewhere, your security checks blow up (http://blog.regehr.org/archives/970 https://bugs.chromium.org/p/nativeclient/issues/detail?id=24...) .

[+] msbarnett|9 years ago|reply

> Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!).

This mythical time never existed. You always had to know your compiler -- C simply isn't well specified enough that you can accurately predict the meaning of many constructs without reference to the implementation you're using.

It used to, if anything, be much much worse, with different compilers on different platforms behaving drastically different.

[+] dbaupp|9 years ago|reply

Optimisers are what made C what it is: they convert the idealised PDP-11 assembly into something efficient on modern computers, and speed is something C programmers care about.

[+] tacostakohashi|9 years ago|reply

If you do know your compiler and your CPU (singular), you're probably not really programming C.

Conversely, if you maintain software that compiles on a bunch of compilers, operating systems and architectures (particularly little endian + big endian, 32 bit + 64 bit), then it's probably written in something rather like C. A lot of people do this.

[+] ArkyBeagle|9 years ago|reply

Just use the parts that work unambiguously. It's a surprisingly small subset of the language.

[+] baby|9 years ago|reply

undefined behavior has always been undefined behavior. Optimizing the compiler doesn't change that fact.

[+] unknown|9 years ago|reply

[deleted]

[+] Tharre|9 years ago|reply

I don't think this Q&A format makes for a good case of not knowing C.

I mean I got all answers right without thinking about them too much, but would I too if I had to review hundreds of lines of someone else's code? What about if I'm tired?

It's easy to spot mistakes in isolated code pieces, especially if the question already tells you more or less what's wrong with it. But that doesn't mean you'll spot those mistakes in a real codebase (or even when you write such code yourself).

[+] moosingin3space|9 years ago|reply

This is further compounded by how difficult it is to build useful abstractions in C, meaning that much real-world C consists of common patterns, and reviewers focus on recognizing common patterns, which increases the chances that small things slip through code review.

Agreed that these little examples aren't too difficult, especially if you have experience, but I certainly do not envy Linus Torvalds' job.

[+] hermitdev|9 years ago|reply

It's worth noting that for example #12, the assert will only fire for debug builds (i.e. the macro NDEBUG is not defined). So, depending on how the source is compiled, it may be able to invoke the div function with b == 0.

[+] eon1|9 years ago|reply

C also: https://news.ycombinator.com/item?id=12902304

[+] userbinator|9 years ago|reply

IMHO the problem is with compilers (and their developers) who think UB really means they can do anything, when what programmers usually expect is, and the standard even notes for one of the possible interpretations of UB, "behaving during translation or program execution in a documented manner characteristic of the environment".

http://blog.regehr.org/archives/1180 and https://news.ycombinator.com/item?id=8233484

[+] sjolsen|9 years ago|reply

>the problem is with compilers (and their developers) who think UB really means they can do anything

But that's exactly what undefined behavior means.

The actual problem is that programmers are surprised-- that is, programmers' expectations are not aligned with the actual behavior of the system. More precisely, the misalignment is not between the actual behavior and the specified behavior (any actual behavior is valid when the specified behavior is undefined, by definition), but between the specified behavior and the programmers' expectations.

In other words, the compiler is not at fault for doing surprising things in cases where the behavior is undefined; that's the entire point of undefined behavior. It's the language that's at fault for specifying the behavior as undefined.

In other other words, if programmers need to be able to rely on certain behaviors, then those behaviors should be part of the specification.

[+] monocasa|9 years ago|reply

There's no compiler writers throwing out

  if(undefined_behavior) {
    ruin_developers_day();
  }

It tends to be the effects of valid by the spec optimizations making assumptions that would only not be true during undefined behavior.

[+] maxlybbert|9 years ago|reply

People have been a little sloppy with the terms, but there's a difference between implementation defined behavior and undefined behavior. Generally, the committee allows undefined behavior when it doesn't believe a compiler can detect a bug cheaply.

Of course, many programmers complain about how the committee defines "cheaply." Trying to access an invalid array index is undefined because the way to prevent that kind of bug would be to add range checking to every array access. So, each extra check isn't expensive, but the committee decided that requiring a check on every array access would be too expensive overall. The same applies to automatically detecting NULL pointers.

And the fact that the standard doesn't require a lot -- a C program might not have an operating system underneath it, or might be compiled for a CPU that doesn't offer memory protection -- means that the committee's idea of "expensive" isn't necessarily based on whatever platforms you're familiar with.

But it is certainly true that a compiler can add the checks, or can declare that it will generate code that acts reliably even though the standard doesn't require it. And it's even true that compilers often have command line switches specifically for that purpose. But in general I believe those switches make things worse: your program isn't actually portable to other compilers, and when somebody tries to run your code through a different compiler, there's a very good chance they won't get any warnings that the binary won't act as expected.

[+] nsajko|9 years ago|reply

Why restrict yourself to one compiler if you can write portable code?

Clang and gcc provide flags that enable nonstandard behavior, and you can use static and dynamic (asan, ubsan) tools to detect errors in your code, it does not have to be hard to write correct code.

[+] to3m|9 years ago|reply

In the main, people seem to be unfamiliar with what lies underneath C, so they never seem to really get this idea that you might be able to (or want to) expect any behaviour other than that imposed by its own definition.

[+] sparky_|9 years ago|reply

I suppose this sort of ambiguity is what drives the passion of Rust and Go programmers.

[+] barsonme|9 years ago|reply

Sorta. I write mostly Go (some JS, PHP) and I got 6/10, forgetting mostly stupid stuff like passing (-INT_MIN, -1) to #12.

But some of those are prevalent in Go. For example, 1.0 / 1e-309 is +Inf in Go, just as it is in C—it's IEEE 754 rules. int might not always be able to hold the size of an object in Go, just like C. In Go #6 wraps around and is an infinite loop, just like C.

The questions that don't, in some way, translate to Go are #2, #7, #8, and #10.

But, to your credit, I do like how Go has very limited UB (basically race conditions + some uses of the unsafe package) and works pretty much how you'd expect it to work.

[+] federicoponzi|9 years ago|reply

Before: What? I know C. After 3 questions: Ok, I don't know C. Well played sir.

[+] E6300|9 years ago|reply

1. Unless C's variable definition rules are completely different from C++'s, int i; is a full definition, not a declaration. If both definitions appear at the same scope (e.g. global), this will cause either a compiler error or a linker error. A variable declaration would be extern int i;

[+] khedoros1|9 years ago|reply

C's variable definition rules are different from C++'s. gcc happily compiles those two lines, g++ exits with the "redefinition" error.

[+] brianmurphy|9 years ago|reply

As a former C programmer, you know not to fool around at the max bounds of a type. That avoids all of the integer overflow/underflow conditions. When in doubt, you just throw a long or unsigned on there for insurance. :)

[+] nightcracker|9 years ago|reply

I got every single one right. Does that mean I know C through and through? Perhaps. But all of these are the 'default' FAQ pitfalls of C, not the really tricky stuff.

[+] unknown|9 years ago|reply

[deleted]

[+] AndyKelley|9 years ago|reply

I made this post as a response. Disclaimer: yet another programming language trying to dethrone C. People seem to be less enthusiastic about the subject these days.

http://andrewkelley.me/post/zig-already-more-knowable-than-c...

[+] kvakkefly|9 years ago|reply

Anyone who enjoys this will also enjoy http://cppquiz.org

[+] Hydraulix989|9 years ago|reply

I feel bad because I'm smart enough to answer these questions correctly in a quiz format, but if I saw any of them in production code, I would not even think twice about it.

(the quiz questions themselves lead you on, plus I read the MIT paper on undefined behavior that was posted on here back in 2013)

[+] unknown|9 years ago|reply

[deleted]

[+] rdc12|9 years ago|reply

Isn't this line from #3, undefined behavior not mentioned in the article (sequence point violation)

zp++ = xp + *yp;

[+] msbarnett|9 years ago|reply

That's not a sequence point violation. The C standard makes it clear that zp gets xp + *yp prior to the increment. Quoting 6.5.2.4

> The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. (That is, the value 1 of the appropriate type is added to it.) See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.

178 comments