top | item 10157018

C Programming Guidelines

176 points| colinprince | 10 years ago |github.com | reply

101 comments

order
[+] ggreer|10 years ago|reply
I'm the author of The Silver Searcher (aka ag)[1]. I wrote my first C program almost 20 years ago, and I have some not-very-nice things to say about this guide. I hope the author won't take it personally.

First, Zed Shaw's Learn C the Hard Way is not worth reading. I agree with most of Tim Hentenaar's criticisms.[2] It's confusing, incomplete, and overly opinionated. If you want to get started with C, read the K&R book. It's not perfect, but it's better than anything else I know of.

As for the guide itself, the quality of its advice varies greatly. Some bits are great. Some are good. Some are controversial. Some are dangerously double-edged. And some are Not Even Wrong. It would take too long to go into detail, so I'll just say: Please don't blindly follow this guide. Explore other codebases (including the ones the author linked to). Talk to other programmers. Write your own projects, and get feedback from those who are more knowledgable.

You might notice that this advice generalizes to every language. That's because C isn't special. If anything, the language itself is simpler than most. C has just had more time to accumulate cruft, both technical and cultural. So don't feel intimidated. Once you learn those bits of historical trivia, you'll be fine.

1. https://github.com/ggreer/the_silver_searcher

2. http://hentenaar.com/dont-learn-c-the-wrong-way

[+] btrask|10 years ago|reply
Thanks for for the comment. I'm the author of the article. I'd really appreciate some more specific feedback, particularly the parts that you think are dangerous or "not even wrong."

I'll take a look at the Silver Searcher's code and see what pearls of wisdom I can gain.

[+] asveikau|10 years ago|reply
This seems like a pretty good list. Focus on error handling is spot-on, most good C is just simply that.

One of my favorite tricks is to make your success case and your error cases look largely the same.

Say you have N allocations and N things that can fail. Lots of people get into trouble when they do:

    void *p, *q, *r, *s;

    p = a();
    if (!p) { return ERROR; }
    q = b();
    if (!q) { free(p); return ERROR; }
    r = c();
    if (!r) { free(p); free(q); return ERROR; }
    // etc..
And this madness just goes on and on. Add a new allocation? You potentially have to change n cleanup blocks.

I breathe a lot easier when I can do [note we abuse the fact that free(NULL) is a no-op]:

       void *p = NULL;
       void *q = NULL;
       void *r = NULL;

       p = a();
       if (!p) goto cleanup;
       q = b();
       if (!q) goto cleanup;
       r = c();
       if (!r) goto cleanup;

    cleanup:
       free(p);
       free(q);
       free(r);
What if the buffer at "r" needs to return something to the caller? It turns out it's really easy to transfer ownership in this pattern:

       // Success case.  Give `r` to the caller.
       *callerPointer = r;
       r = NULL;

       // Then we fall through to the failure block...
       // r is null now so free(r) is harmless.
    cleanup:
       free(p);
       free(q);
       free(r);
[+] mieko|10 years ago|reply
This is The Way To Do It. I take minor issue to calling free(NULL) abuse, though (I know you get it, but allow me to soapbox). I've known programmers who get squeamish relying on this, as if it's a hack, even though it's been mandated by the standard forever. It's like they believe someone would put together a fully-functional libc and miss that simple requirement somehow.

I think the C standard library would be easier to work with if they would've taken this further, and required, for example, fclose(NULL) to be a no-op.

[+] userbinator|10 years ago|reply
That's a very common pattern in Linux kernel code, where there's also nested cleanups, and is one of the most suitable use-cases for a goto.

Although in your example, if the code really was doing only that much, I'd recognise the ||/&& "ladder" structure and write it like this instead:

    if(!(p=a()) && (q=b()) && (r=c()))) {
     free(p);
     free(q);
     free(r);
    }
[+] stevewepay|10 years ago|reply
Not that error handling is unimportant, but my opinion is that C is about focusing on resource usage, and keeping very close track over giving back any resource you take, including memory, etc. If you do that, then the number of hard-to-debug runtime bugs drops significantly.
[+] VOYD|10 years ago|reply
"goto", really?
[+] hyc_symas|10 years ago|reply
"Put one statement conditionals on same line as condition: if (rc < 0) goto fail;"

This is bad advice. A source-level debugger only works in source line numbers; if you ever need to set a breakpoint to find out when an error condition occurs you need to be able to set it on the action statement "goto fail" and not be bothered by the breakpoint tripping every time the if() gets evaluated.

Always separate the conditional from the statement.

[+] fishanz|10 years ago|reply
I totally agre and that immediately popped out at me. When I started coding I would do this to 'save space', until burned by the breakpoint as you point out. Then I started putting it on the next line, indented, until a more seasoned developer told me about the gotchas of not having braces around a 'one-liner'. So now what used to take one line takes 4 lines, but I'm sure it will circumvent a 'goto fail' bug at some point!
[+] btrask|10 years ago|reply
Hey Howard, thanks for the feedback. That's a good point that I hadn't really run into (probably depends on personal debugging style). I still think keeping error handlers small is worth it, but I will mention that tradeoff.
[+] electroly|10 years ago|reply
Maybe your source-level debugger only works in line numbers. Visual Studio doesn't. It knows that "goto fail" is a separate statement and you can place a breakpoint directly on that, not on the if statement, even though they're on the same line.
[+] CardenB|10 years ago|reply
Plus it just seems like a readability issue in the long run.
[+] caf|10 years ago|reply
Use signed char for strings...

What? No. Use plain "char" for strings (which may be signed or unsigned, depending on the environment).

If you use "signed char" in an environment where plain "char" is unsigned then a comparison like:

  string[i] == '\xff'
will be always false.
[+] metromint|10 years ago|reply
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.

In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.

[+] btrask|10 years ago|reply
You're right, and that statement was poorly worded. If you check my project, I use plain char, which is signed on most platforms. The benefit of using unsigned char for binary data is the same, but the compiler warnings won't help on certain platforms.
[+] geofft|10 years ago|reply
The security brag list at the bottom linking qmail and seL4 is a little funny.

qmail has the property that (if it is indeed secure, which is a disputed claim) it's secure by virtue of djb's superhuman awesomeness. I'm happy to believe that djb can come up with secure elliptic curves, but I'm under no delusions that reading all his papers and coming up with my own elliptic curves is in any way a good idea. Similarly, even if I take the djb way of software writing to heart, I don't see a reason to believe that I could write software with qmail's level of robustness. I am just not a world-class expert the way that he is.

seL4, meanwhile, has a formal model of the C programming language in the proof assistant Isabelle/HOL, a formal model of what they want their C codebase to do (and not do), and a formal proof that, in fact, their C codebase has the properties they hope that it has. The C is handwritten, but it may as well not be; you can't do feature additions, significant refactoring, etc. to the C without needing to replicate those all in a language that is completely unlike C.

If these are the best arguments in favor of writing secure software in C, you don't really need any counterarguments....

[+] pjmlp|10 years ago|reply
If you read the history of C, written by Ritchie, he says that Johnson created it, because it was already clear in 1979 that not everyone was able to write correct C code.

And here we are with millions of dollars/euros/yans/... spent in all forms of analyzers, theorem provers, hardware extensions (MPX, cheri) to workaround language flaws.

[+] jmspring|10 years ago|reply
Ugh --

Older C isn't better when it comes to security and avoiding bugs K&R C is much beloved, but that innocent era is gone We can't afford to fight each other, or C will really die

--

Older doesn't mean there wasn't error handling and comprehensive bug checking. Some of us have been writing defensive code for years. Age of "style" does not imply approach to security. OpenBSD has a great deal of C code and is security conscious.

Really, articles like these are surpurfulus to the fact that there are shitty coders out there in any language.

[+] pjmlp|10 years ago|reply
There is only one that allows for security exploits by design.
[+] okasaki|10 years ago|reply
> Immediately clear pointers after freeing

This can be misleading...

    void foo(char *buf) {
        ...
        free(buf); 
        buf = NULL;
    }

    void bar() {
        char *b = malloc(123);
        foo(b);

        // b isn't NULL...
    }
My advice is: always compile and run with AddressSanitizer. Take the time to fix errors/leaks as soon as possible.
[+] weland|10 years ago|reply
Why are you freeing b in foo?

It's confusing if you're mallocing in one scope and freeing it in another one (foo is especially confusing, besides being a disaster ready to happen). But if you write bar() like this:

    void bar() {
        char *b = malloc(123);

        foo(b);

        free(b);
        b = NULL;
    }
it looks all right.

Not that I'm advocating this kind of use. I have... mixed feelings about it. But I've seen it used and it doesn't have to be confusing.

Its primary merit is that it makes use-after-free cases immediately obvious, since they're going to trigger a segfault.

But there are better ways to achieve that.

[+] btrask|10 years ago|reply
If `foo` doesn't own `buf`, it should be declared `char * const` (or `char const * const`). If `foo` takes owner ship of `buf` (and is intended to free it), it should take a pointer to the pointer to clear it for the caller. This is one way to write it:

    void foo(char **const bufptr) {
        ...
        free(*bufptr); *bufptr = NULL;
    }
You can dereference `bufptr` at the top of the function to make the code more readable. This is how (almost) every object freeing function works in my project.
[+] viraptor|10 years ago|reply
Valgrind / asan / clang-analyzer on their own take care of most of the issues listed under memory management.
[+] zatkin|10 years ago|reply
Personal preferences does not imply that it's good, especially for things like

"Single line if possible: if(rc < 0) return rc;"

[+] voltagex_|10 years ago|reply
Lines like that cause conditions to be missed (e.g. Apple's goto fail bug)

    if (rc < 0)
    {
      return rc;
    }
always IMO.
[+] tadfisher|10 years ago|reply
Agreed. The author claims to write a "substance" guide rather than a "style" guide, but many of these points boil down to style over substance.
[+] jasonwatkinspdx|10 years ago|reply
"Use struct thing thingptr[1]; to get pointers with inline storage ... You should pretty much always use -> instead of . ... I think this is the most important trick for making C code beautiful"

I don't really get what they're after with this section. Can anyone shed some light?

[+] andymurd|10 years ago|reply
It is similar to:

    struct thing athing;
    struct thing *thingptr = &athing;
Then there should be no confusion as to whether you write:

    athing.member = 12;
or:

    thingptr->member = 12;
[+] mahyarm|10 years ago|reply
If all it takes is one human screw up with 3 lines of code to get full remote execution, you will inevitably get security bugs with that code. A best practices guide will not work, because people will not execute on their best practices %100 of the time. A language specification or at least a linter will enforce it %100 of the time in a practical comparison. This is why people want to sunset C family languages.
[+] tmuir|10 years ago|reply
I do mostly embedded programming, which is overwhelmingly dominated by C. The reason I think C will probably never be replaced as the dominant low level language is inertia. Every microprocessor manufacturer knows they have to provide drivers, libraries, and board support packages for their parts. If they don't, no one will use those processors, because of the extra effort involved, and the fact that other companies do provide them. All of these drivers, libraries, and board support packages are written in C. Writing them in something other than C is basically equivalent to not writing them at all, because instead of forcing developers to reinvent the wheel, you're forcing them to use a language most of them don't know. In the end, it's the exact same problem: a large amount of effort required to get to the point where you can start implementing your application.

This same logic applies to real time operating system vendors. Everything is written in C. The minute they start writing their code in a different language is the minute their competitors become that much more attractive. On top of that, one of the major selling points of the major RTOSes is that they have mature code bases with decades of bug fixes and products using them.

Perhaps this presents an opportunity for disruption, but it seems like a steep hill to climb.

[+] woodman|10 years ago|reply

  Performance
    The restrict keyword should be used rarely if ever
wut?
[+] btrask|10 years ago|reply
It's usually penny wise, pound foolish.
[+] haberman|10 years ago|reply
> Do not worry about 1% speedups. SQLite has gotten significantly faster from lots of ~1% boosts ...And it'll never be as fast as MDB

What MDB is being referred to here?

[+] sleepydog|10 years ago|reply
If I had to guess from looking at his git repo, he's referring to LMDB::

http://symas.com/mdb/

sqlite and lmdb aren't exactly equivalent, but I guess that is his point; you can gain a lot of performance just by understanding your use case and what tradeoffs you can make.

[+] jimjag|10 years ago|reply
Bad C programmers will simply fault C itself for their own deficiencies. I find most of this advice as really bad and not-recommended.
[+] deathanatos|10 years ago|reply
> Use `signed char` for strings

If you're dealing with POSIX / the C stdlib, mirror it and use char* . (The string version of "Respect the integer types used by each API."; if you're coding for Windows you'll likely need to use LPWSTR or whatever they call it now.) But: if you're storing UTF-8 string data, you _want_ unsigned char (and maybe uint8_t[2]): if start trying to inspect the code units, esp. to decode them to code points, you're going to need to do comparisons and bit manipulations on them, and its too easy to write `code_unit < 0x80` or something similarly silly (e.g., while masking bits to see if it's a continuation byte). It's like the article early today called "Should I use Signed or Unsigned"[1] states: "Prefer unsigned when using bitwise operators".

> Limit recursion

Recursion is fine; you simply need to be aware of your worst case stack depth. You're looking, really, for the complexity of your stack usage, or the big-oh of your stack size. If you're doing something like recursing down an in-memory balanced binary tree, it's O(log(n)) in the worst case: ~64 stack frames on a 64-bit CPU. Can you fit that many on the stack? Certainly that's heavier than a normal function call, but it's still a finite, computable size.

If you can't compute that / prove an upper bound, then I agree.

> Aside: any language that uses keywords like public and private to determine visibility instead of placement/scoping is bad

If not with keywords, how should visibility be determined? (How does "placement/scoping" determine visibility?)

> Use get/init or create/copy/alloc in function names to indicate ownership versus borrowing for return values

Inevitable, I feel like you end up creating some structure, let's call it a Foo. So you then have a Foo * Foo_create(), a Foo_destroy(Foo * ). Some action performable on Foo? Foo_reticulate_splines(Foo* , …). Combined with,

> You should generally track used and unused space in dynamic arrays,

and the overhead of writing functionality to deal with dynamic arrays in C, the entire section about managing strings, and you've come to the very reason I left C: I'm already writing C++. Foo_create is a ctor; Foo_destroy a dtor; Foo_reticulate_splines a member function. Combined with vectors & strings taking care of day-to-day memory management, and I'm left with little reason to use C.[3] RAII adds some degree of object lifetime management that simply isn't present in C (but does not add the compile-time checks that I'm starting to love Rust for.)

[1]: http://blog.robertelder.org/signed-or-unsigned-part-2/

[2]: I know C++ got some decent types for Unicode _very_ recently; I'm not aware if C got those.

[3]: I am not suggesting C++ to be the language you _should_ be using. I simply cannot justify ever using C over it.

[+] pjmlp|10 years ago|reply
I never liked C.

Back when I got to learn it, around 1993, I had already used Turbo Pascal 3, 5.5 and 6.0.

C seemed so primitive and unsafe in regards to Turbo Pascal features.

However the same person that got me into C, also showed me C++.

There it was, a language that could be used as safely as Turbo Pascal, provided one made use of the language features instead of the C copy-paste compatibility.

When Turbo Pascal stopped being a possibility, I joined ranks with the C++ side, in the C vs C++ flame wars.

Even teached C++ at the university as a senior student.

Only used C instead of C++, when asked to do so by university teachers or customers.

[+] jeffreyrogers|10 years ago|reply
For memory allocation you often don't need to use malloc. Many (most?) programs can mmap a large region of memory and allocate from that. Then when you're done you just unmap the region. This is much easier to use since you only have to remember to unmap once as opposed to handling each allocated pointer individually.

Edit: Here is a paper that touches a bit on what I'm describing: http://people.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

Also, to clarify, I'm not saying allocate yourself a big piece of memory and then reimplement malloc on top of that. You literally just get a big piece of memory and every time you want to allocate something new from it add the offset to the base address, return that to the user, and store the new offset.

[+] rc4algorithm|10 years ago|reply
No no no no no. When you do this, you are:

1) Reinventing malloc for no good reason 2) Opening yourself up to a whole world of security holes

See Bob Beck's talk on the first 30 days of LibreSSL. Heartbleed wouldn't have been nearly as catastrophic if OpenSSL just used the system memory allocator like normal people.

[+] caf|10 years ago|reply
Setting aside the wisdom of this approach, if you were going to do this why wouldn't you just use malloc() to allocate the single large arena that you're going to allocate from yourself? It's still just one free().

In other words, the malloc-versus-mmap distinction here isn't the important one, it's the custom allocator versus system allocator one.

[+] userbinator|10 years ago|reply
For memory allocation you often don't need to use malloc.

True.

Many (most?) programs can mmap a large region of memory and allocate from that.

You don't even need to do that for many applications. Coming from an embedded systems background where dynamic allocation is highly discouraged and almost never necessary, the amount of superfluous allocation/deallocation I see in a lot of code is astounding; maybe it's because of my experience, but I can often very easily find a way to do something without dynamic allocation and consequently O(n) space, with what usually turns out to be simpler (thus less chance of being buggy) code and O(1) space.

But when you do need dynamic allocation, malloc() isn't bad at all.

[+] btrask|10 years ago|reply
Hey Jeff, I had to double check because I couldn't believe this was you. I'm a big fan of the Jeff and Casey Show!

The problem with doing custom allocation is not specifically due to bugs in the allocator per se, but because your custom allocator will probably not be hardened against other bugs like some other allocators are. Specifically you will piss off the OpenBSD people who have put a lot of work into their allocator with "exploit mitigations."

A bump allocator is pretty much guaranteed to let overflows (reads and writes) touch other valid memory. A hardened allocator will use guard pages and other tricks to prevent those bugs from being exploitable.

One thing you can do is use bump allocators within "security zones." For example data related to a single user can be allocated together without much risk, but data from separate users is kept separate.

This is very different from the world of game programming. I've been wondering lately if there will be some sort of bifurcation of the field.