The C preprocessor is a horrendous way of doing metaprogramming that was implemented because it was relatively easy to do as a separate pass. There's a reason why very few other languages have done it this way.
A good knowledge of the preprocessor is essential for writing obfuscated and underhanded C. For example, the lucky7coin backdoor: https://github.com/alerj78/lucky7coin/issues/1 where the code
IMO, whether or not the C preprocessor is good depends on what you're trying to do and how you do it. I doubt there are any preprocessors or macro systems that can't be used to obfuscate code - That's basically the definition of what they do, modify your code before you compile it. Obviously, and strange/unexplained preprocessor usage should be examined and preferably removed.
The example you gave is not really fair though, because it seems pretty obvious to me that nobody ever looked at that code - it hardly matters they hid the backdoor in the the C pre-processor. If you take a look at the repo, it only has three commits - With the first one (https://github.com/alerj78/lucky7coin/commit/07d7e5fc53e5673...) being a supposed import of the code from the repo it used to exist in, and it's in this commit where the backdoor was inserted. The real issue is that people were running code from someone who appears to be a complete unknown, has no history for his code, and just assumed it was the same as the old code without checking.
This is so absurdly simple and yet devastating. Reading some of the comments on the Github issue you posted, this stood out (I don't know anything about lucky7coin):
> So disappointing such code was not reviewed by Vern and team before running it on the server where damage could result.
So this code was actually put into production somewhere at some point -- wow. And cursory code review and compiling from source will do absolutely nothing here.
Interesting. I don't think I ever tried to use a macro in the conditional expression of a #if, except inside a defined() or undef(). From my time on the C committee, I recall that the preprocessor was a royal pain to get right. It has it's own set of token rules that aren't the same as C itself, for example.
I am also reminded of the button I used to have that said, "Defining define is undefined."
This doesn't seem quite right. Did you maybe mean "#undefine FOO" or "! defined(FOO)"? Whether BAR gets expanded or not, in your example it looks like it would always evaluate true. Or am I misunderstanding the ambiguity?
It might be telling that I also don't understand the Clang bug report as written. I think there are typos in the examples. Is the switch from "HAVE_FOO_BAR" to "HAVE_FOO" in the first example intentional? Is the construct "#defined" (with a final 'd') intentional in the second?
#2 is incorrect. Being sensitive to line breaks does not make a grammar context-sensitive. It just means you have to treat line breaks as tokens rather than ignorable whitespace (which is exactly what the context-free grammar given in the C11 standard does).
Same with the bit about concatenating tokens. Every single one of those examples has a static parse tree, which, for the C preprocessor, is a sequence of tokens and directives. The author seems to be confusing the preprocessor's parse tree with the effect it has on the underlying text.
(Yes, the output of the preprocessor is dependent on what you define, but that has nothing to do with the grammar. What the author claims is like saying a Lisp is context-sensitive because the factorial function produces a different values for different inputs!)
Now, if you could do this:
#define foobar define
#foobar x 123
x
and get "123", that would be a context-sensitive grammar. But that is NOT a thing you can do!
I hate to say it, but I was rather unimpressed by this list, and nothing in it surprised me. While I certainly agree that the C preprocessor is a relic, and has not weathered the test of time well, I would suggest that a number of the supposed infelicities mentioned in this article stem from the misleading idea that the preprocessor is an integral part of the C language proper, when it is better thought of as its own language (and one that was traditionally done by a completely separate program). The preprocessor does things differently than the rest of C, because it's not C. It is a text-processing language of convenience, provided specifically for doing things that C itself cannot (or should not) do.
I've written a C preprocessor and I agree that the language standard documents are ambiguous and incomplete. The best I could do was hack on it until it matched GCC's preprocessor well enough to compile Linux.
I don't recall all the horrid details, but one case that I do remember driving me nuts was the use of #if/#endif in the argument to a function-like macro.
Has there been any notion of a replacement Meta/Macro language for C? Something open source. Of course pre-preprocessing one's files and the complexity that might add to the build system are unattractive but I'd still be interested if someone has attacked this problem.
Much of what makes 'C' annoying can be made less painful by referring to static/const struct tables/arrays. Those are a prime candidate for generation.
You don't have to keep the preprocessing of files as part of the mainline build, but there's something to be said for it - sort of "make GENERATE_ALL_THE_THINGS" might run the preprocessing { Python/Tcl/Perl/bash/even 'C' } scripts for you.
If the generators just emit .h files, that can be pretty good. You're still left with something #ifdef-ey to select them, based on #defines or -D options.
You might even go so far as to dynamically load these tables if that can make sense. The ld linker can directly link in blobs.
The module and template features (along with static if, if the committees figure out what to do in that area) in the newest versions of C++ together get pretty close to replacing the C preprocessor.
My school of thought would be to limit it to just #include, #if, #else, #end, and non-recursive single word only #define / #undef. Force everything to be 1 per single line, and call it a day.
Macros should always be the absolute last resort to doing anything. Stepping through code in gdb with some "creative" macro-based API is almost as bad as C++.
I'm 95% sure the last example in #3 is undefined behaviour. #(a b c) is not valid, so evaluating it with multiple levels of indirection probably is a compiler bug for not erroring out.
And the last 3 or 4 are odd, but are required for some of the hacks required in the early days of C (and some are almost certainly used in the Linux kernel source today).
[+] [-] pjc50|10 years ago|reply
A good knowledge of the preprocessor is essential for writing obfuscated and underhanded C. For example, the lucky7coin backdoor: https://github.com/alerj78/lucky7coin/issues/1 where the code
expands to[+] [-] DSMan195276|10 years ago|reply
The example you gave is not really fair though, because it seems pretty obvious to me that nobody ever looked at that code - it hardly matters they hid the backdoor in the the C pre-processor. If you take a look at the repo, it only has three commits - With the first one (https://github.com/alerj78/lucky7coin/commit/07d7e5fc53e5673...) being a supposed import of the code from the repo it used to exist in, and it's in this commit where the backdoor was inserted. The real issue is that people were running code from someone who appears to be a complete unknown, has no history for his code, and just assumed it was the same as the old code without checking.
[+] [-] yid|10 years ago|reply
> So disappointing such code was not reviewed by Vern and team before running it on the server where damage could result.
So this code was actually put into production somewhere at some point -- wow. And cursory code review and compiling from source will do absolutely nothing here.
[+] [-] kazinator|10 years ago|reply
[+] [-] nly|10 years ago|reply
[+] [-] evmar|10 years ago|reply
http://reviews.llvm.org/D15866
clang and gcc will pick the #if branch while Visual Studio will take the #else branch.[+] [-] rootbear|10 years ago|reply
I am also reminded of the button I used to have that said, "Defining define is undefined."
[+] [-] random_upvoter|10 years ago|reply
[+] [-] nkurz|10 years ago|reply
It might be telling that I also don't understand the Clang bug report as written. I think there are typos in the examples. Is the switch from "HAVE_FOO_BAR" to "HAVE_FOO" in the first example intentional? Is the construct "#defined" (with a final 'd') intentional in the second?
[+] [-] cremno|10 years ago|reply
https://gcc.gnu.org/onlinedocs/cpp/Computed-Includes.html
[+] [-] cyphar|10 years ago|reply
[+] [-] speeder|10 years ago|reply
[+] [-] colanderman|10 years ago|reply
Same with the bit about concatenating tokens. Every single one of those examples has a static parse tree, which, for the C preprocessor, is a sequence of tokens and directives. The author seems to be confusing the preprocessor's parse tree with the effect it has on the underlying text.
(Yes, the output of the preprocessor is dependent on what you define, but that has nothing to do with the grammar. What the author claims is like saying a Lisp is context-sensitive because the factorial function produces a different values for different inputs!)
Now, if you could do this:
and get "123", that would be a context-sensitive grammar. But that is NOT a thing you can do![+] [-] breadbox|10 years ago|reply
[+] [-] robertelder|10 years ago|reply
[+] [-] TazeTSchnitzel|10 years ago|reply
http://conal.net/blog/posts/the-c-language-is-purely-functio...
[+] [-] pklausler|10 years ago|reply
I don't recall all the horrid details, but one case that I do remember driving me nuts was the use of #if/#endif in the argument to a function-like macro.
[+] [-] DubiousPusher|10 years ago|reply
[+] [-] ArkyBeagle|10 years ago|reply
You don't have to keep the preprocessing of files as part of the mainline build, but there's something to be said for it - sort of "make GENERATE_ALL_THE_THINGS" might run the preprocessing { Python/Tcl/Perl/bash/even 'C' } scripts for you.
If the generators just emit .h files, that can be pretty good. You're still left with something #ifdef-ey to select them, based on #defines or -D options.
You might even go so far as to dynamically load these tables if that can make sense. The ld linker can directly link in blobs.
[+] [-] pcwalton|10 years ago|reply
[+] [-] ctstover|10 years ago|reply
Macros should always be the absolute last resort to doing anything. Stepping through code in gdb with some "creative" macro-based API is almost as bad as C++.
[+] [-] cyphar|10 years ago|reply
[+] [-] cyphar|10 years ago|reply
[+] [-] biot|10 years ago|reply