My 'favourite' bit of surprising (not undefined) behaviour I've seen recently in the C11 spec is around infinite loops, where
void foo() { while (1) {} }
will loop forever, but
void foo(int i) { while (i) {} }
is permitted to terminate...even if i is 1:
> An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate
It means that empty loops (loops with empty bodies) can be completely removed if the controlling expression has no side effects.
> This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.
It means while(i) {} can be eliminated as if i were 0, because there are no side effects in the loop expression or the loop body, and what would be the point of the loop if it never terminated on a non-constant expression?
As an optimization, the optimizer is allowed to eliminate it as a useless loop with no side effects. If you really want an infinite loop, you can use while (1) {}.
There are cases where automatically generated C code might have empty loops which are useless.
If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles.
This definition is actually required for the correctness of many standard compiler optimizations such as partial redundancy elimination and code motion.
I'll be honest, I didn't find any of these to be particularly surprising. If you've been using C and are familiar with strict-aliasing and common UB issues I wouldn't expect any of these questions to seriously trip you up. Number 2 is probably the one most people are unlikely to guess, but that example has also been beaten to death so much since it started happening that I think lots of people (Or at least, the people likely to read this) have already seen it before.
I'd also add that there are ways to 'get around' some of these issues if necessary - for example, gcc has a flag for disabling strict-aliasing, and a flag for 2's complement signed-integer wrapping.
I don't think #2 has been fully beaten to death yet.
Assuming a platform where you don't segfault (say that 'page 0' variables are valid) and thus runtime does proceed; I still can't think of any /valid/ reason to eliminate the if that follows (focus line 2 in the comments).
Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?
In my opinion that is an, often working but, incorrect optimization.
Yeah, I agree. I used to write C full time, but haven't in around 6 years, and I only flubbed #11 & #12 (I knew there was undefined behavior but couldn't remember why; after reading the answers I was like "duh", esp for #12 after having read #11).
I've never actually run into #2 in practice, though: even at -O3 the dereference in line 1 has always crashed for me, though I guess probably because I've never written code for an OS where an address of 0 is valid and doesn't cause a SIGSEGV or similar.
What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?
Yes, like most of the "undefined behaviour allows your computer to format the disk"-style posts this one seems to be written by a programmer with novice-intermediate C knowledge.
What irks me is the intro >> The purpose of this article is to make everyone (especially C programmers) say: “I do not know C”. <<
I think the purpose of the article was mainly for the author to write down some things he learned. Apparently it was his expectation that readers wouldn't be able to answer the quiz.
However, if you can't answer (at least most) of these questions correctly you're _not_ an expert c programmer.
So I think the correct intro here should be "The purpose of this blog post is to to show that if you want to learn C, you actually have to learn it and should not attempt to 'wing it'".
...and maybe also that you should not write patronizing blog posts about a topic which you haven't fully grasped yet yourself.
Not a full-time C programmer, and I was still correct on all of them except #1. Certainly C is more dangerous than other languages, but I don't understand the push to convince people that it is impossible to understand.
I don't think most C programmers share your depth of the language. I tried hard to explain strict aliasing once, and utterly failed. The dev was convinced that he knew the exact behavior of the platform, and that it was fine. Yet people constantly find examples where we "know" what the compiler will do, and it does something completely different.
It killed the one thing C was good at - simplicity (you know exactly what happens where, note I'm not saying speed, as C++ can be quite a bit faster than C).
> Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!).
This mythical time never existed. You always had to know your compiler -- C simply isn't well specified enough that you can accurately predict the meaning of many constructs without reference to the implementation you're using.
It used to, if anything, be much much worse, with different compilers on different platforms behaving drastically different.
Optimisers are what made C what it is: they convert the idealised PDP-11 assembly into something efficient on modern computers, and speed is something C programmers care about.
If you do know your compiler and your CPU (singular), you're probably not really programming C.
Conversely, if you maintain software that compiles on a bunch of compilers, operating systems and architectures (particularly little endian + big endian, 32 bit + 64 bit), then it's probably written in something rather like C. A lot of people do this.
I don't think this Q&A format makes for a good case of not knowing C.
I mean I got all answers right without thinking about them too much, but would I too if I had to review hundreds of lines of someone else's code? What about if I'm tired?
It's easy to spot mistakes in isolated code pieces, especially if the question already tells you more or less what's wrong with it. But that doesn't mean you'll spot those mistakes in a real codebase (or even when you write such code yourself).
This is further compounded by how difficult it is to build useful abstractions in C, meaning that much real-world C consists of common patterns, and reviewers focus on recognizing common patterns, which increases the chances that small things slip through code review.
Agreed that these little examples aren't too difficult, especially if you have experience, but I certainly do not envy Linus Torvalds' job.
It's worth noting that for example #12, the assert will only fire for debug builds (i.e. the macro NDEBUG is not defined). So, depending on how the source is compiled, it may be able to invoke the div function with b == 0.
IMHO the problem is with compilers (and their developers) who think UB really means they can do anything, when what programmers usually expect is, and the standard even notes for one of the possible interpretations of UB, "behaving during translation or program execution in a documented manner characteristic of the environment".
>the problem is with compilers (and their developers) who think UB really means they can do anything
But that's exactly what undefined behavior means.
The actual problem is that programmers are surprised-- that is, programmers' expectations are not aligned with the actual behavior of the system. More precisely, the misalignment is not between the actual behavior and the specified behavior (any actual behavior is valid when the specified behavior is undefined, by definition), but between the specified behavior and the programmers' expectations.
In other words, the compiler is not at fault for doing surprising things in cases where the behavior is undefined; that's the entire point of undefined behavior. It's the language that's at fault for specifying the behavior as undefined.
In other other words, if programmers need to be able to rely on certain behaviors, then those behaviors should be part of the specification.
People have been a little sloppy with the terms, but there's a difference between implementation defined behavior and undefined behavior. Generally, the committee allows undefined behavior when it doesn't believe a compiler can detect a bug cheaply.
Of course, many programmers complain about how the committee defines "cheaply." Trying to access an invalid array index is undefined because the way to prevent that kind of bug would be to add range checking to every array access. So, each extra check isn't expensive, but the committee decided that requiring a check on every array access would be too expensive overall. The same applies to automatically detecting NULL pointers.
And the fact that the standard doesn't require a lot -- a C program might not have an operating system underneath it, or might be compiled for a CPU that doesn't offer memory protection -- means that the committee's idea of "expensive" isn't necessarily based on whatever platforms you're familiar with.
But it is certainly true that a compiler can add the checks, or can declare that it will generate code that acts reliably even though the standard doesn't require it. And it's even true that compilers often have command line switches specifically for that purpose. But in general I believe those switches make things worse: your program isn't actually portable to other compilers, and when somebody tries to run your code through a different compiler, there's a very good chance they won't get any warnings that the binary won't act as expected.
Why restrict yourself to one compiler if you can write portable code?
Clang and gcc provide flags that enable nonstandard behavior, and you can use static and dynamic (asan, ubsan) tools to detect errors in your code, it does not have to be hard to write correct code.
In the main, people seem to be unfamiliar with what lies underneath C, so they never seem to really get this idea that you might be able to (or want to) expect any behaviour other than that imposed by its own definition.
Sorta. I write mostly Go (some JS, PHP) and I got 6/10, forgetting mostly stupid stuff like passing (-INT_MIN, -1) to #12.
But some of those are prevalent in Go. For example, 1.0 / 1e-309 is +Inf in Go, just as it is in C—it's IEEE 754 rules. int might not always be able to hold the size of an object in Go, just like C. In Go #6 wraps around and is an infinite loop, just like C.
The questions that don't, in some way, translate to Go are #2, #7, #8, and #10.
But, to your credit, I do like how Go has very limited UB (basically race conditions + some uses of the unsafe package) and works pretty much how you'd expect it to work.
1. Unless C's variable definition rules are completely different from C++'s, int i; is a full definition, not a declaration. If both definitions appear at the same scope (e.g. global), this will cause either a compiler error or a linker error. A variable declaration would be extern int i;
As a former C programmer, you know not to fool around at the max bounds of a type. That avoids all of the integer overflow/underflow conditions. When in doubt, you just throw a long or unsigned on there for insurance. :)
I got every single one right. Does that mean I know C through and through? Perhaps. But all of these are the 'default' FAQ pitfalls of C, not the really tricky stuff.
I made this post as a response. Disclaimer: yet another programming language trying to dethrone C. People seem to be less enthusiastic about the subject these days.
I feel bad because I'm smart enough to answer these questions correctly in a quiz format, but if I saw any of them in production code, I would not even think twice about it.
(the quiz questions themselves lead you on, plus I read the MIT paper on undefined behavior that was posted on here back in 2013)
That's not a sequence point violation. The C standard makes it clear that zp gets xp + *yp prior to the increment. Quoting 6.5.2.4
> The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. (That is, the value 1 of the appropriate type is added to it.) See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
[+] [-] aidanhs|9 years ago|reply
void foo() { while (1) {} }
will loop forever, but
void foo(int i) { while (i) {} }
is permitted to terminate...even if i is 1:
> An iteration statement whose controlling expression is not a constant expression, that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate
To make things a bit worse, llvm can incorrectly both of the above terminate - https://bugs.llvm.org//show_bug.cgi?id=965.
[+] [-] pcvarmint|9 years ago|reply
> This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.
It means while(i) {} can be eliminated as if i were 0, because there are no side effects in the loop expression or the loop body, and what would be the point of the loop if it never terminated on a non-constant expression?
As an optimization, the optimizer is allowed to eliminate it as a useless loop with no side effects. If you really want an infinite loop, you can use while (1) {}.
There are cases where automatically generated C code might have empty loops which are useless.
If you really want to go to sleep, use pause() or similar. An infinite loop eats up CPU cycles.
[+] [-] fmap|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] adamnemecek|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] DSMan195276|9 years ago|reply
I'd also add that there are ways to 'get around' some of these issues if necessary - for example, gcc has a flag for disabling strict-aliasing, and a flag for 2's complement signed-integer wrapping.
[+] [-] mjevans|9 years ago|reply
Assuming a platform where you don't segfault (say that 'page 0' variables are valid) and thus runtime does proceed; I still can't think of any /valid/ reason to eliminate the if that follows (focus line 2 in the comments).
Under what set of logic does being able to de-reference a pointer confer that it's value is not 0 (which is what the test equates to)?
In my opinion that is an, often working but, incorrect optimization.
[+] [-] kelnos|9 years ago|reply
I've never actually run into #2 in practice, though: even at -O3 the dereference in line 1 has always crashed for me, though I guess probably because I've never written code for an OS where an address of 0 is valid and doesn't cause a SIGSEGV or similar.
What's the best way to "fix" strict aliasing without disabling the undefined behavior around it? Using a union?
[+] [-] the_cap_theorem|9 years ago|reply
What irks me is the intro >> The purpose of this article is to make everyone (especially C programmers) say: “I do not know C”. <<
I think the purpose of the article was mainly for the author to write down some things he learned. Apparently it was his expectation that readers wouldn't be able to answer the quiz.
However, if you can't answer (at least most) of these questions correctly you're _not_ an expert c programmer.
So I think the correct intro here should be "The purpose of this blog post is to to show that if you want to learn C, you actually have to learn it and should not attempt to 'wing it'".
...and maybe also that you should not write patronizing blog posts about a topic which you haven't fully grasped yet yourself.
[+] [-] rwj|9 years ago|reply
[+] [-] deathanatos|9 years ago|reply
[+] [-] junk_disposal|9 years ago|reply
It killed the one thing C was good at - simplicity (you know exactly what happens where, note I'm not saying speed, as C++ can be quite a bit faster than C).
Now, due to language lawyering, you can't just know C and your CPU, you have to know your compiler (and every iteration of it!). And if you slip somewhere, your security checks blow up (http://blog.regehr.org/archives/970 https://bugs.chromium.org/p/nativeclient/issues/detail?id=24...) .
[+] [-] msbarnett|9 years ago|reply
This mythical time never existed. You always had to know your compiler -- C simply isn't well specified enough that you can accurately predict the meaning of many constructs without reference to the implementation you're using.
It used to, if anything, be much much worse, with different compilers on different platforms behaving drastically different.
[+] [-] dbaupp|9 years ago|reply
[+] [-] tacostakohashi|9 years ago|reply
Conversely, if you maintain software that compiles on a bunch of compilers, operating systems and architectures (particularly little endian + big endian, 32 bit + 64 bit), then it's probably written in something rather like C. A lot of people do this.
[+] [-] ArkyBeagle|9 years ago|reply
[+] [-] baby|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] Tharre|9 years ago|reply
I mean I got all answers right without thinking about them too much, but would I too if I had to review hundreds of lines of someone else's code? What about if I'm tired?
It's easy to spot mistakes in isolated code pieces, especially if the question already tells you more or less what's wrong with it. But that doesn't mean you'll spot those mistakes in a real codebase (or even when you write such code yourself).
[+] [-] moosingin3space|9 years ago|reply
Agreed that these little examples aren't too difficult, especially if you have experience, but I certainly do not envy Linus Torvalds' job.
[+] [-] hermitdev|9 years ago|reply
[+] [-] eon1|9 years ago|reply
[+] [-] userbinator|9 years ago|reply
Related reading:
http://blog.metaobject.com/2014/04/cc-osmartass.html
http://blog.regehr.org/archives/1180 and https://news.ycombinator.com/item?id=8233484
[+] [-] sjolsen|9 years ago|reply
But that's exactly what undefined behavior means.
The actual problem is that programmers are surprised-- that is, programmers' expectations are not aligned with the actual behavior of the system. More precisely, the misalignment is not between the actual behavior and the specified behavior (any actual behavior is valid when the specified behavior is undefined, by definition), but between the specified behavior and the programmers' expectations.
In other words, the compiler is not at fault for doing surprising things in cases where the behavior is undefined; that's the entire point of undefined behavior. It's the language that's at fault for specifying the behavior as undefined.
In other other words, if programmers need to be able to rely on certain behaviors, then those behaviors should be part of the specification.
[+] [-] monocasa|9 years ago|reply
[+] [-] maxlybbert|9 years ago|reply
Of course, many programmers complain about how the committee defines "cheaply." Trying to access an invalid array index is undefined because the way to prevent that kind of bug would be to add range checking to every array access. So, each extra check isn't expensive, but the committee decided that requiring a check on every array access would be too expensive overall. The same applies to automatically detecting NULL pointers.
And the fact that the standard doesn't require a lot -- a C program might not have an operating system underneath it, or might be compiled for a CPU that doesn't offer memory protection -- means that the committee's idea of "expensive" isn't necessarily based on whatever platforms you're familiar with.
But it is certainly true that a compiler can add the checks, or can declare that it will generate code that acts reliably even though the standard doesn't require it. And it's even true that compilers often have command line switches specifically for that purpose. But in general I believe those switches make things worse: your program isn't actually portable to other compilers, and when somebody tries to run your code through a different compiler, there's a very good chance they won't get any warnings that the binary won't act as expected.
[+] [-] nsajko|9 years ago|reply
Clang and gcc provide flags that enable nonstandard behavior, and you can use static and dynamic (asan, ubsan) tools to detect errors in your code, it does not have to be hard to write correct code.
[+] [-] to3m|9 years ago|reply
[+] [-] sparky_|9 years ago|reply
[+] [-] barsonme|9 years ago|reply
But some of those are prevalent in Go. For example, 1.0 / 1e-309 is +Inf in Go, just as it is in C—it's IEEE 754 rules. int might not always be able to hold the size of an object in Go, just like C. In Go #6 wraps around and is an infinite loop, just like C.
The questions that don't, in some way, translate to Go are #2, #7, #8, and #10.
But, to your credit, I do like how Go has very limited UB (basically race conditions + some uses of the unsafe package) and works pretty much how you'd expect it to work.
[+] [-] federicoponzi|9 years ago|reply
[+] [-] E6300|9 years ago|reply
[+] [-] khedoros1|9 years ago|reply
[+] [-] brianmurphy|9 years ago|reply
[+] [-] nightcracker|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] AndyKelley|9 years ago|reply
http://andrewkelley.me/post/zig-already-more-knowable-than-c...
[+] [-] kvakkefly|9 years ago|reply
[+] [-] Hydraulix989|9 years ago|reply
(the quiz questions themselves lead you on, plus I read the MIT paper on undefined behavior that was posted on here back in 2013)
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] rdc12|9 years ago|reply
zp++ = xp + *yp;
[+] [-] msbarnett|9 years ago|reply
> The result of the postfix ++ operator is the value of the operand. After the result is obtained, the value of the operand is incremented. (That is, the value 1 of the appropriate type is added to it.) See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
The last sentence is key.
[+] [-] wmu|9 years ago|reply
[+] [-] raarts|9 years ago|reply