Notes on Programming in C – Rob Pike (1989)

[+] gens|9 years ago|reply

>Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be selfevident. Data structures, not algorithms, are central to programming. (See Brooks p. 102.)

I came to the same conclusion, after a while. In the end code is there to process data. (there is meta programming and things like state machines but most programs do hold data in some kind of structures to be processed)

>Simple rule: include files should never include include files.

Yes, please!

[+] kbob|9 years ago|reply

>Simple rule: include files should never include include files.

For small programs, that's a good rule* . For larger code bodies, it's counterproductive. If my module uses foo, there's no reason for my module to know that foo depends on bar. If the next version of foo drops its bar dependency and now uses baz which also requires fat, foo's clients shouldn't have to be rewritten.

You can ameliorate the breakage sometimes by keeping dependencies out of header files (and that's generally a good idea), but sometimes you can't.

My personal rule is exactly the opposite: every header should compile with no prerequisites. For that reason, the vast majority of my foo.h files are included as the very first line of foo.c. If foo.c doesn't compile, I know to fix foo.h.

Pike's concern that lexical analysis is the most expensive compiler phase is way out of date.

* In 1989, there were no large programs. (-:

[+] planteen|9 years ago|reply

>>Simple rule: include files should never include include files.

> Yes, please!

Let's say I have a C++ class defined in a header that uses std::string. So I require that anyone who uses my header already has <string> included in version 1.0.0. Then, in version 1.1.0, I added to the class in the header so that now it requires std::map. If the caller doesn't have <map> included, there will now be a compilation error! How is that better than simply including string and map in the class header itself?

[+] unscaled|9 years ago|reply

Most of the advice holds true, but there's one bit you should happily ignore: don't uglify your code with external include guards - that is put your include guards in the included file.

Modern compilers perform include guard optimization: http://www.bobarcher.org/software/include/results.html

There is one exception to this rule which I think still holds true: MSVC doesn't perform this optimization correctly, perhaps due to its love affair with precompiled headers.

I personally prefer solving that issue with pragma once, as I find the risk of having an include guard name clash higher than the chance of having two the same file accessible between to different hardlinks or copies, let alone compiling anything on network share. It's also a lot more readable.

[+] combatentropy|9 years ago|reply

Lots of good advice from a long-time programmer (https://en.wikipedia.org/wiki/Rob_Pike). I have bookmarked these notes before, because his simple but effective tips resonate with me. From the article:

> Algorithms, or details of algorithms, can often be encoded compactly, efficiently and expressively as data rather than, say, as lots of if statements.

I have read elsewhere about moving your code complexity to your data. But I can't find that other article. It's hard to find mention at all of this strategy. But I have found it to be true. Moving details from PHP into the database results in shorter code overall. The first example that comes to mind is replacing a bunch of if-statements with one or more columns in the database, like for some kind of categorization.

[+] ycmbntrthrwaway|9 years ago|reply

> I have read elsewhere about moving your code complexity to your data. But I can't find that other article.

http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2...

ESR actually references Pike's "Notes on C Programming".

[+] mjevans|9 years ago|reply

Generalize the business logic in to things you do all the time, with data that you lookup as part of the context.

In that way your program is actually more flexible, there aren't magic numbers or magic things that happen only in some cases. Those have been moved to co-residence with the data they belong with.

[+] glangdale|9 years ago|reply

These are instructive to look at now. Important to recall the '1989' date of course, but with hindsight...

I love Rule 5 ("choose the right data structures and the algorithms will almost always be self-evident"), especially when combined with STL and strong typing. There is a degree of irony in taking this advice from Pike, given the design of Go.

Much of the material on complexity assumes you are hand-coding things from scratch. I am happy to take on complexity (as long as I understand what I'm getting into) from well-designed libraries rather than building something "simple and robust" from scratch. The statement about binary trees vs splay trees is illustrative; unless I need to see the bare data structure again I would much rather take on something from the STL, complex or otherwise.

[+] zjarek_s|9 years ago|reply

STL is exactly those simple data structures referenced in this article. For a long time it didn't even have something as basic as a hash table.

[+] DSMan195276|9 years ago|reply

Generally speaking all this stuff is good, but I'd add that his note on Include files is dated and I wouldn't recommend it anymore. Any compiler worth half its salt at this point will recognize the include guard pattern and not parse though include'd files multiple times, so the worry about wasting tons of time due to that pattern is largely gone.

The big problem with what he's suggesting is that if you're designing a fairly big system with lots of decently small headers (Which is generally good - simple headers with easy-to-read APIs are good), you'll end-up with a crazy number of include's in every file - and if you change something to use a new dependency, then you'll have to change every location it is include'd in as well. There is something to be said about avoiding things like circular dependencies, but this requirement really doesn't make it any harder for that to happen, and just creates more problems and annoyances - it is not a very scalable solution.

If you look at the Linux Kernel source (Which is arguably one of the largest and most successful C programs) each source file has at around 10 to 30 include's at the top (Or more in some cases), and that's with the headers including other headers. If instead Linux had taken the approach Rob is recommending, that number would probably be a magnitude larger and extremely hard to manage even if they combined a bunch of the headers they have together to reduce the total number (Which, again, I would consider a huge anti-pattern).

[+] gonzo|9 years ago|reply

"Using Unix is the computing equivalent of listening only to music by David Cassidy." -- Rob Pike

[+] adrianratnapala|9 years ago|reply

I think I agree that includes-within-includes are probably a necessity.

But I am not that it allows things to be broken into "simple headers with easy-to-read APIs". That is true for headers that are small and only #include system headers and other basic dependencies. But if your own code has a nest of interelated headers, then a single big header file is probably easier to read.

[+] moogly|9 years ago|reply

Doesn't every compiler support '#pragma once' these days too, making include guards largely pointless (and like most copy/paste patterns, error-prone) boilerplate?

[+] pcwalton|9 years ago|reply

This seems clearly dated. For example:

> For example, binary trees are always faster than splay trees for workaday problems.

I assume by this Pike means unbalanced binary trees (if not, then red-black trees are decidedly not simpler than splay trees, especially if you need to delete). In that case, I don't really believe it. Nobody uses unbalanced binary trees anymore for good reason: they have awful performance when you do something as simple as inserting keys in sorted order.

[+] adrianratnapala|9 years ago|reply

I'd like to see Pike vs. the MISRA guidelines (https://en.wikipedia.org/wiki/MISRA_C). Rob's notes here are not about the kind of safety critical, hopefully small, programs that MISRA claims to improve.

It would be interesting to hear his thoughts about just those kinds of programs, and whether pointers and function pointers are still helpful.

[+] kbob|9 years ago|reply

It's useful to apply the Steve Yegge political axis metaphor[1] here. C forces a fairly conservative approach because it stocks a full arsenal of footguns. Pike and most of the early Unix guys, though, are about as liberal as they can be within C's constraints. MISRA, on the other hand, is Idaho survivalist camp conservative.

Both parties have good reason for their ideologies. Early Unix programs were tiny enough that it was easy to keep an entire applications' rules in your head and take full advantage of them. So it makes sense to play fast and loose. MISRA-compliant systems, OTOH, are developed by large teams where no member understands the whole system, and the consequences of getting something wrong are measured in attorney man-years.

[1] https://plus.google.com/110981030061712822816/posts/KaSKeg4v...

[+] filleokus|9 years ago|reply

Hmm, I wonder why this is hosted by Lysator (a student association at Linköping University in Sweden), does anyone know?

[+] smarks|9 years ago|reply

Lysator has been around since forever. The English information page [1] says it was founded in 1973. I remember it from the early days of the Internet and possibly even from Usenet days.

Among other things it has a large repository of historical documents and papers on C programming and standardization dating back to the 1980s. See [2].

For what it's worth, I bookmarked the latter link in 2001.

[1] http://www.lysator.liu.se/english/

[2] http://www.lysator.liu.se/c/

[+] Toenex|9 years ago|reply

Don't know but interestingly the Pike programming language was developed at Linköping University. Perhaps Rob Pike has gained some kind of demigod status there?

[+] coldtea|9 years ago|reply

If it's anything like most universities I knew (especially in the 90s and early 2000s) then their CS department websites can host all kinds of documents, papers etc such as this, e.g. from ESR stuff to "Worse is better", Beejs guide etc.

[+] unknown|9 years ago|reply

[deleted]

[+] HeavenBanned|9 years ago|reply

[deleted]

[+] dwringer|9 years ago|reply

Nobody's ever gonna convince me that "maxphysaddr" is a good example of a well-named variable.

[+] microcolonel|9 years ago|reply

[deleted]

39 comments