top | item 7134798

The descent to C

377 points| coherentpony | 12 years ago |chiark.greenend.org.uk | reply

230 comments

order
[+] rayiner|12 years ago|reply
The article is good, but I disagree with this part:

"To a large extent, the answer is: C is that way because reality is that way. C is a low-level language, which means that the way things are done in C is very similar to the way they're done by the computer itself. If you were writing machine code, you'd find that most of the discussion above was just as true as it is in C: strings really are very difficult to handle efficiently (and high-level languages only hide that difficulty, they don't remove it), pointer dereferences are always prone to that kind of problem if you don't either code defensively or avoid making any mistakes, and so on."

Not really, and not quite. A lot of the complexity of C when it comes to handling strings and pointers is the result of not having garbage collection. But it does have malloc()/free(), and that's not really any more fundamental or closer to the machine than a garbage collector. A simple garbage collector isn't really any more complicated than a simple manual heap implementation.

And C's computational model is a vast simplification of "reality." "Reality" is a machine that can do 3-4 instructions and 1-2 loads per clock cycle, with a hierarchical memory structure that has several levels with different sizes and performance characteristics, that can handle requests out of order and uses elaborate protocols for cache coherence on multiprocessor machines. C presents a simple "big array of bytes" memory model that totally abstracts all that complexity. And machines go to great lengths to maintain that fiction.

[+] chubot|12 years ago|reply
And C's computational model is a vast simplification of "reality." "Reality" is a machine that can do 3-4 instructions and 1-2 loads per clock cycle, with a hierarchical memory structure that has several levels with different sizes and performance characteristics, that can handle requests out of order and uses elaborate protocols for cache coherence on multiprocessor machines. C presents a simple "big array of bytes" memory model that totally abstracts all that complexity. And machines go to great lengths to maintain that fiction.

When C was invented, it was very close to reality. It isn't anymore, as you point out. (But as another commenter said, assembly language isn't that close to reality either)

Unfortunately hardware guys and software guys didn't really coordinate, and we just hacked shit up on either side of the instruction set interface.

It is kind of ironic that we write stuff mostly in "serial" languages. But the compiler turns into a parallelizable data flow graph. Then that's compiled back to a serial ISA. And then the CPU goes and tries to execute it in parallel.

It would be a lot nicer if we wrote stuff in parallel/dataflow languages, and the CPU could understand that! Some dilletantism with FPGAs made me realize how mismatched CPUs are for a lot of modern problems.

It's kind of like the idea that Java throws away its type information when compiling to byte code, and then the JIT reconstructs the types at runtime. We have these encrusted representations that cause so much complexity in the stack. C is (relatively) great, but it's also unfortunately one of these things.

[+] saurik|12 years ago|reply
C doesn't really have malloc or free: those are part of the standard library. You can happily code in C without malloc/free, or you can add a library that provides a garbage collected malloc. What C's type system is providing really is bare metal (although as you say, to the abstraction provided by the machine, not to physical reality) to a much more fundamental extent than even a heap allocator, and certainly claiming you could just swap in garbage collection and then have a string type is totally missing the point.
[+] snicklepuff|12 years ago|reply
> C presents a simple "big array of bytes" memory model that totally abstracts all that complexity.

I don't understand what you mean by this. Machine code itself abstracts away the underlying hierarchical memory structure. Sure, some machine language might have instructions to manipulate the cache, but those are easily invoked from C, using either inline assembly or __builtin functions.

[+] agumonkey|12 years ago|reply
I wonder if someone will (or already has) design a low-level language which is async and cache-hierarchy aware.
[+] gaius|12 years ago|reply
Agreed; a language like Forth is much closer to the machine, tho' if you really want "the machine" then assembly is the way to go, and writing with a good macro assembler is surprisingly high level. I still pine for the days of Devpac.
[+] Dewie|12 years ago|reply
> "To a large extent, the answer is: C is that way because reality is that way.

I thought C was so widespread that it eventually started to affect how some computer architectures were designed? If so it seems a bit disingenuous to say that it is only dealing with the reality that it was given.

[+] ChuckMcM|12 years ago|reply
Shhhhhh! If you let them know how fun it is then everyone will want to be C programmers :-)

I got to use my crufty C knowledge to useful effect when I discovered that there is no standard system reset on Cortex M chips. That lead me to trying to call "reset_handler" (basically the function that kicks off things at startup) which I couldn't do inside an ISR because lo and behold there is "magic" in isrs, they are done in "Handler" mode versus "Thread" mode and jumping to thread mode code is just wrong apparently. C hackery to the rescue, hey the stack frame is standard, make a pointer to the first variable in the function walk backwards on the stack to return address, change it to be the function that should run next, and return. Voila, system reset.

The whole time I am going "Really? I have to look under your covers just to make you do something anyone might want to do?" As a respondent to one of my questions put it "ARM is a mixture of clever ideas intermixed with a healthy dose of WTF?"

[+] eckzow|12 years ago|reply
I don't mean to be a party pooper since I've done my fair share of hacky workarounds in v7-M processors and it is always exhilarating when it works...

But since I'm a Cortex M fanboy I have to defend its reset capabilities :)

For your specific case, try something like

  volatile uint32_t *AIRCR = (volatile uint32_t*)0xE000ED0C;
  const uint32_t VECTKEY = 0x05FA << 16;
  const uint32_t SYSRESETREQ = 1 << 2;

  void take_reset()
  {
    *AIRCR = VECTKEY | SYSRESETREQ;
    __dsb();
    while(1);
  }
It's technically dependent on external hardware in your processor subsystem, but it should work if your implementer has half a brain (or, at least cares enough to read the integration manual). If it doesn't work, please flog your implementer publicly so that I can know to avoid them in the future...

Incidentally, even that tiny code snippet is uses a C extension (the __dsb intrinsic) which is either a great example of how C can be wielded to great power (I can generate raw instructions!) or how C is terribly handicapped (I need a special compiler extension or all my system code is horribly broken!). All depends on point of view, I guess...

Anyway, more info @ page 498, http://web.eecs.umich.edu/~prabal/teaching/eecs373-f10/readi...

[+] diydsp|12 years ago|reply
Exactly! Fortunately, the article didn't point out one of C's great advantages is that it's so lightweight it can fit into and run on all kinds of interesting systems where other languages are infeasible economically. Cortices rule!
[+] timthorn|12 years ago|reply
Re: your ARM comment - had you noticed who wrote the article?
[+] Svip|12 years ago|reply
I want to be a C programmer now! (I am actually serious, this article has inspired me to try some more C; which I have been seriously neglecting.)
[+] couchand|12 years ago|reply
Cortex as in Cortex Semi?
[+] nathanb|12 years ago|reply
Professional C programmer here...

The points are great and this is generally a good primer for someone who wants to understand the C mindset.

The bit at the end is a bit off, though. It feels like the author is saying "yeah, C is weird and crufty for historical reasons and some people just use it because they're backward like that". Yeah, I write kernel drivers, but I also just plain like using C, for the same reason that I like driving a manual transmission and usually disable the safety features on stuff: C tries really hard to not get in your way.

I enjoy programming in Ruby and mostly enjoy programming in Javascript. But there are times when I think "this is an unnecessary copy...this is inefficient...I wouldn't have to do this if I were writing in C".

(There are also times where I think "this one line of code would be over 100 lines of C", but we won't get into that right now...).

[+] adrianm|12 years ago|reply
I experienced the inner monologue you describe for several years. It haunted my dreams when working with Ruby. But one day a rhetorical thought suddenly dawned on me that has since changed my perspective quite dramatically...

"...if I'm being incessantly bothered by what I perceive as the nagging inefficiencies of some programming language's implementation, maybe I'm not thinking about or relating to programming languages (in the large) in the way I should be..."

If all programming languages are merely tools to communicate instructions to a computer, then why is human language not merely viewed by everyone as a means to an end as well? Surely, most would agree that language is more than simply an ends to a mean, and that language does far more than simply transmit information between parties. If efficiency, lack of ambiguity, etc., were the paramount goals of human language, surely formal logic, or perhaps even a programming language for interpersonal communication would be more fitting than natural language!

So why do we insist on communicating with each other with what is often such an abstract and ambiguity filled medium?

I'll let wikipedia elaborate on my behalf: http://en.wikipedia.org/wiki/Pragmatics and http://en.wikipedia.org/wiki/Deixis http://en.wikipedia.org/wiki/Literature

tldr; it is trivial, even natural for an literate individual with the proper context to understand concepts in language that seemingly transcend the words themselves. These notions would be (and are) exceedingly difficult to formalize, and any formal expression of these ideas would cause exponential growth of the output.

Ever try explaining a joke to someone who didn't "get it"? It takes a lot more "space" to convey the same sentiment than to someone who "got it".

So what has this crazy rant have to do with anything? Well, aside from revealing I am a complete nerd, it speaks to my approach to software engineering today.

We have to let go of the machine if we ever want to really move the state of the art forward.

There are an infinitude of expressible ideas, but lacking the proper medium to abstract the expression of these ideas formally (like natural language and our brains do, well, naturally) we will never get a chance to find out what we don't know!

"We're doing it wrong" is not exactly the sentiment I'm trying to express, but it's sorta that. Maybe.

Hope this comment made any sense. :) It's 4 AM after all.

[+] scott_s|12 years ago|reply
Perhaps I can state a point simpler than another poster.

"this is an unnecessary copy...this is inefficient...I wouldn't have to do this if I were writing in C"

You should then ask yourself: does the inefficiency matter? Will it make the program noticeably slower? If not, then you can safely ignore the lack of machine efficiency and embrace the gain in programmer efficiency.

[+] huherto|12 years ago|reply
I fell in love with C, 25 years ago, but then I moved into enterprise applications using higher level languages. How is the job market place for C programmers? I would imagine that younger programmers don't go that route.
[+] mtdewcmu|12 years ago|reply
The joy of C isn't just in writing it. It's also about getting back a program that runs unreasonably fast at the end. Sometimes you really can tell.
[+] laichzeit0|12 years ago|reply
Recently I had to write a program for an embedded Linux router which ran on a MIPS architecture and had a 2MB flash. I only had about 40kb of space to fit the application on. I was able to get a binary that was compiling to more than 1.5mb down to 20kb through using a combination of gcc tricks like separating data and code sections, eliminating unused sections, statically linking some libraries and dynamically linking against others. It once again gave me immense appreciation for having a language and toolchain that can give you this power for those 1% of problems your career might depend on.

For amusement, the relevant section of the Makefile I ended up with:

LIB_NL_MIPS_STATIC=libs/libnl_mips_static

CFLAGS=-Os -I$(LIB_NL_BASE)/include -ffunction-sections -fdata-sections -Wall -Wextra -MD

LDFLAGS=-Wl,--gc-sections -L$(LIB_NL_MIPS_STATIC) -Lbuild -Wl,-Bstatic -lmon -lnl-3 -lnl-route-3 -Wl,-Bdynamic -lpthread -lm -lc

I'm unsure how many other languages/toolchains give you that sort of flexibility down to the linking level. Also it's self contained and doesn't require some kind of "virtual machine" or interpreter to run it.

[+] pjmlp|12 years ago|reply
> I'm unsure how many other languages/toolchains give you that sort of flexibility down to the linking level. Also it's self contained and doesn't require some kind of "virtual machine" or interpreter to run it.

Almost every language that has an ahead of time compiler to native code.

[+] userbinator|12 years ago|reply
That makes me wonder why "eliminating unused sections" isn't a default, as it feels like the compiler is doing a lot of unnecessary work if it's generating 1.5M of output that actually has only 20k of useful stuff in it.
[+] rlpb|12 years ago|reply
> But those aren't the reasons why most C code is in C. Mostly, C is important simply because lots of code was written in it before safer languages gained momentum...

I disagree. Certainly in the FLOSS community, I don't think this is true.

C is a lowest common denominator. No higher level language has "won". So if you want the functionality in a library you write to be available to the majority, you will need to make it available (ie. provide bindings for) a number of high level languages. The easiest way to do this is to provide a C-level API. This works well because the higher level languages are all implemented in C. This isn't because C is more popular, but because it is a low level language. The easiest way to provide a C-level API is to write the code in C. So: library writers often write implementations in C.

There are three alternatives:

1) Independently implement each individual useful piece of functionality in every high level language. This does happen, but more general implementations tend to move quicker, since they have more users (because they support multiple high level languages) and thus more contributors. The number of contributors might dwindle because of the requirement to code in C, but I don't think this has happened to a significant enough extent yet.

2) Implement libraries in a higher level language and then provide bindings to every other popular higher level language. This can be done, but I haven't seen much of it. Higher level languages seem to make it easier to provide bindings to a C-level API rather than APIs written in a different higher level languages. This may be something to do with impedence mismatches in higher level language concepts.

3) A higher level language "wins", and everyone moves to such an ecosystem. This can only happen if other higher level languages lose. I don't think there is any sign of this happening.

[+] Tegran|12 years ago|reply
> And there's no simple excuse for the preprocessor; I don't know exactly why that exists, but my guess is that back in the 1970s it was an easy way to get at least an approximation to several desirable language features without having to complicate the actual compiler.

Clearly this guy has never had to deal with a large, complicated code base in C. Dismissing the preprocessor as a crutch for a weak compiler shows a significant ignorance about the useful capabilities that it brings.

[+] i_c_b|12 years ago|reply
I assume when he says "no simple excuse", it's more pointing to the massive problems that the mere existence of the macro pre-processor introduces for reasoning about the text of any C or C++ program, for programmers, tools, and compilers.

I've worked in a code base where, tucked away in a shared header file somewhere up the include chain, a programmer had added the line

#define private public

(because he wanted to do a bunch of reflection techniques on some C++ code, IIRC, and the private keyword was getting in his way)

Now regardless of whether that's a good idea, if you are reading C or C++ code, you always have to be aware, for any line of code you read, of the possibility that someone has done such a thing. Hopefully not, but unless you have scanned every line of every include file included in your current context recently, as well as every line of code preceding the current one in the file you're reading, you just can't know. Clearly this makes giant headaches for compliation and tools, as well.

So yeah, of course every mid to large C / C++ program uses the macro pre-processor extensively. You can do useful things with it, and there's no way to turn it off and not use it, anyway, given the way includes work in C / C++, so you might as well take advantage of it.

But it's not an accident that more recent languages have dropped that particular feature.

[+] thirsteh|12 years ago|reply
He wrote PuTTY and all of the assorted tools. I'd say that's a fairly large codebase.

Also, one of the main reasons Go was created was that the authors were tired of the compile times caused largely by preprocessing.

[+] Guthur|12 years ago|reply
I doubt he meant that it was not useful, rather that the usefulness might have been better served as a function of the compiler rather than some disconnected transformation tool.

Of course this thought process would eventual bring you down the road of macros systems such as those in Lisps, but that's going to be more difficult with a language lacking homoiconicity of code and data.

[+] rehack|12 years ago|reply
>7. There is no convenient string type

This the reason, which stops me from going back to C. After coding in Java (mostly) for past 10 years. I wanted to switch back to C or C++. Mainly to save on ton of memory being used which I think is unwarranted.

So I experimented with a new service, and coded it in all three C, C++ and Java. When I did this I had not coded in C++ for 10 years, but it did not hurt at all. I could switch back easily with almost no great difficulty. There were some minor inconveniences of foregoing the Eclipse editor. I think, I might have missed Autocomplete the most.

But within hours after I started, I was getting my previous feeling of the Vi(m) editor coding of C++ back. And with the benefit of having STL (vectors, strings, etc.) I did not feel much discomfort.

But coding the same service in C was painful. And it was mainly because of not being able to basic things on strings easily like copy and concatenate.

But thankfully I still managed to do it. And on comparing the three services for latencies and memory usage, I found little difference between C and C++.

So eventually that service was deployed in C++ and still runs the same way.

This above episode happened about an year back, and recently I am using Go to do a lot of services (new as well as moving some old). Mainly I have been motivated by the promise of an easier C, which it seems to offer.

Some services, coded in Go, I have deployed and are already running very well. But even now, I need some more experience on the results side, to have a definitive opinion on whether Go is indeed C with strings lib (and other niceties) for me.

Edit: rephrase for clarity

[+] humanrebar|12 years ago|reply
Most languages get string processing (and its closely related cousin, localization) wrong, even the ones with string classes, so I don't really get my jimmies rustled on C's anemic native string support.

http://www.joelonsoftware.com/articles/Unicode.html

On large enough projects, you end up with all kinds of custom logic around user-entered and user-facing strings, so the lack of native string processing is really only a drawback for tiny and proof-of-concept projects, which aren't really what you use C for anyway.

That being said, the right way to do string processing usually ends up looking a lot uglier than the way we are used to.

[+] theseoafs|12 years ago|reply
Did you use a string library when working in C? C's deficiencies when it comes to string handling are well-known.
[+] lstamour|12 years ago|reply
Knowing a bit of C but often programming in just about any other language, I was recently inspired to work with lower-level languages like C++ thanks to a bunch of talks from Microsoft's Going Native 2013. Specifically Bjarne Stroustrup's The Essence of C++: With Examples in C++84, C++98, C++11, and C++14 -- video and slides at http://channel9.msdn.com/Events/GoingNative/2013/Opening-Key...

C++ really has changed and is changing from what I learned back in university. It's quite exciting. They seem to be standardising and implementing in C++ compilers the way HTML5 is now a living standard with test implementations in browsers. See also: http://channel9.msdn.com/Events/GoingNative/2013/Keynote-Her...

[+] NigelTufnel|12 years ago|reply
There is a great moment in Stroustrup's talk when he shows a short error message in ConceptGcc and the audience applauds.

It seems that Stroustroup was suprised by the applauses.

[+] Chromozon|12 years ago|reply
C is a great language- it let's you get down and dirty with the computer.

However, the one huge downside to programming in C is having to deal with strings. Let's face it, C strings are absolutely terrible. For such an important feature, the string implementation of null terminated char* is just miserable to work with. See: http://queue.acm.org/detail.cfm?id=2010365

[+] michaelhoffman|12 years ago|reply
> If you've used Java or Python, you'll probably be familiar with the idea that some types of data behave differently from others when you assign them from one variable to another. If you write an assignment such as ‘a = b’ where a and b are integers, then you get two independent copies of the same integer: after the assignment, modifying a does not also cause b to change its value.

This is incorrect when it comes to Python. a and b will be two different names for the same integer object, which is stored in a single memory location. The difference is that Python guarantees that integers are immutable.

[+] jffry|12 years ago|reply
Arguably, to a user of the language, these are imperceptible from independent. Changing one cannot change the other (except perhaps through some exotic double-underscore-prefixed function with which my vague knowledge of Python is unfamiliar)
[+] yason|12 years ago|reply
I remember when I first learned C.

I was 13 and having written assembly for years I finally got a machine that was actually equipped for running a full-blown C compiler. Compiling was slow and the produced code was slow but all I could think of was how easily I could generate [assembly] code with just a few lines of C. Loops, pointers, function calls, conditionals... just like that. Wow. So productive.

C felt like writing assembly but with much better vocabulary. C was to assembly language what English was to the caricatured "ughs" of the stone age.

I often compared the output of the compiler to what I would've written myself: the output was bloaty, the compiler was obviously not very smart, but it did do what I wanted and the computers had just got fast enough to be able to actually run useful programs written in C without slowing down the user experience. So you couldn't necessarily distinguish a program written in C from a program written in assembly, and you could "cheat" by choosing C instead. That was so exciting!

The thing is, however, that since these trivial insights of my youth it turns out that C actually never ran out of juice.

I still write C and I'm enjoying it more than ever.

In C, I've learned to raise the layers of abstraction when necessary and writing C in a good codebase is surprisingly close to writing something like Python except several dozen times faster and you can build your memory layout and little details the best suitable way you want, in various meaningful contexts.

I love doing all the muck that comes with C. String handling, memory management, figuring out the best set of functions on top of which to compose your program, doing the mundane tasks the best way in each case, and never hitting a leaky abstraction like in higher level languages.

The thing is, the time I "waste" doing all that pays me back tenfold as I tend to think about the best way to lay out my program while writing the low-level stuff. Because such effort is required there's a slight cost in writing code which makes you think what you want to write in the first place.

In Python you shove in stuff into a few lists and dicts, it just works and you will figure out later what was it that you really wanted and clean it up. But often you're wrong because it was so easy in the beginning. In C, I have to think about my data structures first because I don't want to write all that handling again for a different set of approach. And that makes all the difference in code quality.

However, I don't think you could impose a similar dynamic on a high-level language. There's something in low-level C that makes your brain tick a slightly different way and how you build your creations in C rather than in other languages reflects that. The OP said it very well: C reflects the reality of what your computer does. And I somehow love it just the way it is.

I've worked most of my career in higher level languages but I've never set C aside. It has always been there, even with Python, C++, or some other language. Now I'm writing C again on a regular basis and with my accumulated experience summed into the work it's truly rewarding.

[+] userbinator|12 years ago|reply
I was much the same - started out with Z80 asm, moved onto x86 shortly after that, and never really liked HLLs (including C) until I was almost 18 - I always felt I could do better than the compilers at the time (and I did), so there wasn't any reason to move up. I still use C and x86 asm frequently, more the former now, but I'll sometimes go back to something I wrote in C before and start rewriting bits of it in asm just to see how much smaller I could make it.

> Because such effort is required there's a slight cost in writing code which makes you think what you want to write in the first place.

It also tends to make you think of the simplest, minimal design that works, and that translates into more efficient and straightforward code. Higher level languages make some things really easy, but then I always feel a little disappointed by just how much resources I'm wasting afterwards.

[+] mironathetin|12 years ago|reply
This proves the point: any language is good as long as you really know what you are doing in that language.
[+] zem|12 years ago|reply
if you're not familiar with simon tatham, do poke around his site [http://www.chiark.greenend.org.uk/~sgtatham/] - he has an eclectic and delightful assortment of code and writing. probably best known for putty, but the rest of it is a lot of fun to browse through.
[+] spikels|12 years ago|reply
The article does exactly what it sets out to do: introduce C to programmers used to more modern languages.

I started programming in C again a few months ago after a 15 year hiatus and the language I remembered loving seemed strange and tedious. This would have been a great reminder of the many differences that after a while you just take for granted. Something similar would be useful for most languages but just more so for C (or say, FORTRAN).

My only quibble would be that while malloc/free are covered many variables are simply automatically aloocated and deallocated on the stack. C's dual approach to memory management is yet another frequent source of confusion.

[+] warmwaffles|12 years ago|reply
I love C. It wasn't my first language to jump in to, but it was eye opening to see the power of pointers and low level operations. Java just couldn't get me close enough to the system.
[+] nadam|12 years ago|reply
The article only discusses the 'extremities' (C vs. Python/Java, etc...) when there is an obvious and popular 'compromise': C++, which has most of the discussed advantages of both sides. (Although it has some drawbacks; it is a bit more difficult to master than either C, Java or Python.)
[+] mooreds|12 years ago|reply
As someone who swore off c after a college class and an experience with perl (three cheers for memory management), this was a great intro article to the idioms of c.
[+] jheriko|12 years ago|reply
interesting read. one of the later comments is a bit off the mark though:

" As a direct result of leaving out all the safety checks that other languages include, C code can run faster"

C is fast not just because of missing safety checks but because more generally you don't pay for features you don't use. Things like function calls and reading data are not complicated by run-time type logic for instance - this is very important, its why you can write an Objective-C class which has the same content as a bunch of C functions and the C functions will be (sometimes very significantly) faster.

This is one example, but many language features in high level languages suffer from similar performance problems - by being super generic and ultra late binding they can never perform as fast as a clean implementation which knows everything at compile time.

If you want dynamic late binding type functionality in C you have to do it yourself...

[+] tedchs|12 years ago|reply
What a great explanation. I have been doing some low-level Go programming recently (including implementing the writev syscall), and I think this document would also be useful for Go programmers.
[+] pjmlp|12 years ago|reply
Many of the features people nowadays atribute to C, do exist in other impereative languages that compile to native code.

But as many used to be in diapers when compilers for those languages were available, they only know C.

[+] collingreene|12 years ago|reply
This is really great. I have found myself saying some of these same things when explaining things. Going to keep this in my pocket to use in the future. Thanks!