top | item 9991528

Strange C Syntax

110 points| Tideflat | 10 years ago |blog.robertelder.org | reply

55 comments

order
[+] cjslep|10 years ago|reply
These strange syntaxes are perfect additions for the article "How to write unmaintainable code", which already has a Duff's device example:

    switch(count % 8) {
        case 0: do{ putchar('0' + (int)j);
        case 7:     putchar('0' + (int)j);
        case 6:     putchar('0' + (int)j); /* Unrolled
        case 5:     putchar('0' + (int)j);  * for greater
        case 4:     putchar('0' + (int)j);  * speed.
        case 3:     putchar('0' + (int)j);  */
        case 2:     putchar('0' + (int)j);
        case 1:     putchar('0' + (int)j);
    } while(--j > 0);
Without syntax highlighting, a passing glance may not recognize cases 3 through 5 are commented out.
[+] sago|10 years ago|reply
Seems an odd example: any code with a multiline comment can have the same issue. That's nothing to do with the strangeness of Duff's device. It would work the same in a series of function calls or calculations.

And even then it is a matter of experience. It looks odd to me to have multiline comments not in its own 'paragraph' of the code, so draws the eye right away. Perhaps this would foil some programmers, but not more than once, I'd have thought. In my experience, run-on compound statements are much more common and hard to intuitively spot:

    if (foo)
        bar();
        sun();
Duff's device is difficult to understand from first principles, but even that is a bad example of unmaintainable code because a) it looks like nothing except Duff's device, you only need to see the pattern once or twice and you'd recognise it, and at least know 'it's that weird pattern for unrolling loops', and b) it is a performance optimisation that only belongs in code that is profiled and needs to go that fast. As such it should be well commented to avoid regressions by well meaning refactor-zealots. Inline assembly or heavy intrinsics are more difficult to read than regular C too, so you only use them when you need to. In my experience manual loop unrolling is very rarely needed.
[+] buserror|10 years ago|reply
Duff's is not just about unrolling, it's mostly a way to use the 'case can be anywhere inside a switch statement'. It's used a lot for other purpose, some of them very, very handy. The stackless coroutine trick is very useful for example (see http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html ) On platforms with very little RAM (embedded) it's fantastic to be able to 'port' linear code that might have run in a thread, and convert it to a coroutine that is a lot easier to read and maintain than having to convert it all up to a 'manual' state machine.

Note that with gcc, you can reach the same results by using the 'indirect goto' that is both more powerful, and more dangerous! void * lab; lab = &my_label; goto *lab; my_label: ...

[+] marcosdumay|10 years ago|reply
You are missing a close brace.

I liked a lot how you switch on count and loop on j. If gcc didn't comply, I would never notice it.

[+] 0x400614|10 years ago|reply
That's a great insight. I had to read it a couple times to realize that the comment extended until it found `*/`. This would not pass my code review.
[+] skarap|10 years ago|reply
This is not code - it's a trap!
[+] greenyoda|10 years ago|reply
"You can typedef a function declaration... and declare function prototypes..."

This is actually a very useful technique that I use all the time. It allows you to make sure that a function and any function pointers that point to it always have matching types (since you only have to change the prototype in one place - the typedef).

[+] kabouseng|10 years ago|reply
It helps with complex function pointers, makes the code more readable.
[+] TwoBit|10 years ago|reply
Wouldn't you get a compiler error if there was a mismatch?
[+] m3koval|10 years ago|reply
The bitfield example is misleading. Section 6.7.2.1/10 of the C99 standard says:

"The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined"

There is no guarantee on the order of the bits inside a bitfield. The compiler may also introduce padding, e.g. for alignment purposes. This makes bitfields unusable for unpacking binary data.

Unfortunately, you're stuck with shifting and masking to replicate the same effect.

[+] gdwatson|10 years ago|reply
Implementation-defined behavior is still defined, in this case probably by the platform ABI. It's simply not portable.
[+] buserror|10 years ago|reply
That's one of the few nag I have about C; this syntax is fantastic, and theoretically allow the compiler quite a lot of freedom to optimize access to the bitfield, and the fact that it was never standardized means you just can't use it if you want portable code.
[+] saurik|10 years ago|reply

    *(const char * + char *)  The type of int i is converted to 'char *' and multiplied by sizeof(char)
I am pretty certain this explanation does not make any sense: what is really happening here is that the int, for purposes of the addition, is measuring units sizeof the object being pointed to; there is no meaning I know of to adding two pointers.

    /*  This works because "Hello"[5] == 5["Hello"].*/
At this point, you could really just say the following:

    /*  This works because a[b] == *(a + b), and addition is commutative. */
[+] skarap|10 years ago|reply
This reminded me of another C syntax strangeness: "Flexible array member". It allows you to do something like this:

struct items_with_header { int header_field1; unsigned int length; double array[]; };

Then allocate enough memory and use the struct to access it.

Used it once in a hash-table implementation.

[+] brandmeyer|10 years ago|reply
This might be my biggest gripe with C's syntax.

typename identifier[];

means three different things, depending on where it appears: As a function argument, it is a pointer that will be accessed like an array. As a variable with automatic storage duration, it is an array whose size will be determined by the right-hand-side of an assignment from a braced initializer list. In the middle of a struct definition, it's illegal. And at the end of a struct definition, it is a flexible array member.

[+] asveikau|10 years ago|reply
AFAIK this this was only made legal by c99. Though I have seen it in code that is much older. (Pre-c99 you would give the "flexible" member an element size inside the square brackets, such as 0 or 1, but allocate as if it were a larger size on the heap.)
[+] white-flame|10 years ago|reply
Unions? Function pointers? Typedefs? While it might be bad for karma to point out, intro to C certainly isn't what I expect to be news to "hackers", as per the site's namesake.
[+] HelloNurse|10 years ago|reply
To consider function types, or unjustified assumptions about bitfield unions, or use of parentheses to control nesting of arrays and pointers in declaration "strange" one must be averse to the C language to the point of intolerance. Backlash from working on a C compiler and wishing the task was easier?
[+] andrewchambers|10 years ago|reply
I had an "ohh wow" moment when I realized that the keyword typedef is a storage class. This means it can go anywhere static can go. It just means no variable is introduced, only a type name, otherwise it is the same syntax as declarations.
[+] evincarofautumn|10 years ago|reply
Same. Which, for those unfamiliar, makes these declarations perfectly valid:

    size_t typedef length;

    struct {
      int x, y;
    } typedef foo, *pfoo;
[+] drauh|10 years ago|reply
Wait till you get a load of the Obfuscated C Contest
[+] biot|10 years ago|reply
One of my favorite entries: http://www.ioccc.org/1988/westley.c

  #define _ -F<00||--F-OO--;
  int F=00,OO=00;main(){F_OO();printf("%1.3f\n",4.*-F/OO/OO);}F_OO()
  {
              _-_-_-_
         _-_-_-_-_-_-_-_-_
      _-_-_-_-_-_-_-_-_-_-_-_
    _-_-_-_-_-_-_-_-_-_-_-_-_-_
   _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
   _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
  _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
  _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
  _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
  _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
   _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
   _-_-_-_-_-_-_-_-_-_-_-_-_-_-_
    _-_-_-_-_-_-_-_-_-_-_-_-_-_
      _-_-_-_-_-_-_-_-_-_-_-_
          _-_-_-_-_-_-_-_
              _-_-_-_
  }
[+] thwest|10 years ago|reply
I was under the impression that C11/C99 only guaranteed that the most recently assigned union member would have an initialized value.
[+] jcranmer|10 years ago|reply
From C11: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

union { int x; float y; } b; b.y = 10e5; printf("%x\n", b.x);

That behavior is legal and well-defined behavior (up to the implementation-defined nature of representations) under C99 and C11, but not under C89 and earlier. Unfortunately, although it was made legal in C99, C99 did retain the program as an example in its (non-normative) list of undefined behaviors, which doesn't help clear up its legality.

Its status under C++11 and C++14 is much more debatable. I recall (I may have bad memory) that an early draft of C++0x had incorporated new C99 text on unions, which would have made it legal, but the wording of unions changed dramatically when unrestricted unions were introduced, which means that assessing its present legality relies very heavily on how you extend initialization to types like int and float.

[+] brandmeyer|10 years ago|reply
Strictly speaking accessing that union both ways ~~~violates the strict aliasing rules~~~ isn't portable. However, it is such a common idiom that GCC and other compilers explicitly allow using unions to get around the strict aliasing rules, so long as the access is always performed through the union.
[+] nemesisrobot|10 years ago|reply
Isn't the first example undefined behavior? I always thought you shouldn't assign data to a union using one member, then access the data using a different member.
[+] skarap|10 years ago|reply
IIRC it was undefined behavior until C99. Then it became implementation-defined.
[+] halosghost|10 years ago|reply
> All of these examples you'll see here will compile without warnings or errors even with very strict compiler flags in gcc and clang (gcc -Wall -ansi -pedantic -std=c89 main.c)

Umm, that's really not that restrictive. Use `clang -Weverything -std=c11 main.c` if you want strict warnings.

[+] fit2rule|10 years ago|reply
I've found that a great deal of these idioms are explained in the excellent book "Advanced C Programming: Deep C Secrets" by Peter van der Linden. Its one of my goto books for when I want to enhance my 30 years of C-programming experience with a little more insight - I've read it multiple times since it was published, and always learn something new. Check it out if you want to dive more deeply into some of these oddities:

http://archive.arstechnica.com/etc/books/deep-c.html

[+] macintux|10 years ago|reply
The rare technical book that rewards the reader with not merely technical excellence but also robust humor. I still read parts on occasion even though I have no need for C these days.