Walter Bright on C's Biggest Mistake

[+] asciilifeform|16 years ago|reply

Lack of array bounds checking is not a problem with C.

It is a problem with our hardware.

Introducing bounds checking without introducing a penalty on array access time is impossible on our "C machines".

C/C++ are often thought of as "close to the metal" - but they are close to particular varieties of metal - those designed to run C/C++. We arrived at them through historical accident. There are many other ways to build a computer - and it is not entirely obvious that a "C architecture" is necessarily the simplest or most efficient:

http://www.loper-os.org/?p=46

That a language which is "close to the metal" is braindead is solely a consequence of braindead metal.

The "C architecture" is a universal standard, to the extent that it has become the definition of a computer to nearly everyone. This is why you will never find the phrase "C architecture" in a computer architecture textbook. And yet it is a set of specific design choices and obsolete compromises, to which there are alternatives.

[+] shadytrees|16 years ago|reply

See also the C FAQ, which patiently devotes 24 questions to the topic. (You can almost tell just how frequently the question came up on the list.)

http://c-faq.com/aryptr/index.html

[+] weaksauce|16 years ago|reply

Is there another place that buffer overflows occur than in the char* with no bounds checking? If not, this single fact is the one that leads to so many of the software vulnerabilities in the wild.

[+] xcombinator|16 years ago|reply

Thank god for this mistake, this mistake makes c what it is good at: at low level programming. It just pass directions between functions. Light and fast,no abstractions.

I love it, a way of making assembler like coding but multiplatform.

If I want high level programming I will program in another language but when you want machine control you have c without all the bloat.

[+] barrkel|16 years ago|reply

There would be little lost from having to specify &arr[0], rather than having array typed arr degrade into a pointer directly, but a huge amount to be gained - some very much needed help with tracking array sizes.

[+] 10ren|16 years ago|reply

C combines the power and performance of assembly language with the flexibility and ease-of-use of assembly language.

However, I was amazed to find that modern assembly language (since I was last in the game 25 years ago) has many high-level concepts in it (structures, loops, conditions etc), and looks... suspiciously... C-like.

But you're quite right about portability. Although C is famously not perfectly portable (int sizes, all those #defines - just some of the issues Java tackled), it is a hell of a lot more portable than an actual assembling language. :-)

[+] InclinedPlane|16 years ago|reply

I'd say using null-terminated strings rather than pascal style length embedded strings is C's biggest mistake. Responsible for so many inefficiencies (strlen is O(n) instead of O(1) as it should be) and, worse yet, so many incredibly serious security vulnerabilities.

All to avoid having to incur a 1-3 byte per string overhead or figuring out how to efficiently work around a 255 character limit.

[+] nitrogen|16 years ago|reply

Other than strlen, string operations can be faster with null-terminated strings. Plus, there are other benefits:

If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.

Instead of having to increment and compare both a counter and a pointer (or store a final pointer for comparison) in string-manipulation operations, you just increment the pointer. It makes for very concise loops in strchr, etc.

Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).

Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.

[+] abrahamsen|16 years ago|reply

It is really an instance of the same problem. Walther Brights "fat pointer" proposal would give you length embedded strings for free.

[+] WalterBright|16 years ago|reply

Length-embedded strings are little better than the 0-terminated ones. You cannot take a substring without copying.

The "fat pointer" approach has been used in D for nearly 10 years now, and has proven itself to be very effective.

[+] CrLf|16 years ago|reply

People are permanently trying to "fix" C, but C has nothing to fix.

It is a limited language, both by the constraints at the time of its creation, but also by the problem space where it has been used over the years. And that's how it should be.

C is part of an ecosystem of languages, it doesn't have to be changed to acommodate the latest fads or to fix problems that nevertheless never stopped it from being widely used for decades.

If C doesn't fit a purpose, don't use it. You don't even have to stray too far, since there are a few languages that basically are just C with extras.

[+] gchpaco|16 years ago|reply

We have been awash in buffer overflows and other, similar errors (printf strings come to mind) that are actually impossible in a safer language for years. SQL injection can happen in a safer language but you can't take over the web server by doing them. There is nothing fundamental about system languages that requires unsafe array operations. This is a flaw, and it is a flaw of C specifically and a flaw inherited by many C-descended languages. This is not some ivory tower thing that was discovered after C was designed; it was apparent even at the time (although Pascal's fix was pretty bad, variable length arrays fix it neatly). There are compiler articles from the late 70s and early 80s pointing out how even a naïve compiler could easily optimize out bounds checking in most operations!

[+] InclinedPlane|16 years ago|reply

People are trying to create a language that fits the niche that C is supposed to fill in today's world but just barely misses the mark (fast, low-level, but still sane and more or less modern). To a lot of people it looks like the easiest way to create a language that fills that niche is to fix the bugs in C rather than create something new from scratch.

[+] Luyt|16 years ago|reply

When I was reading this, I thought "Nooo! Don't make the size of an array part of its type!" That has been rightfully shown as a very bad idea by Brian Kernighan, see http://www.lysator.liu.se/c/bwk-on-pascal.html Luckily the proposal is about passing a 'fat pointer', really a pointer and a length. I did that often in my C programs too: int process(char *buf, int buflen);

Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just syntactic sugar.

[+] nimrody|16 years ago|reply

I don't think Walter suggested having the array size part of the type (static array types).

He suggested using "fat pointers" -- pointers along with their extent. This is similar to how many Pascal compilers treat the type "String".

Kernighan mentions Pascal strings in his article but claims the solution does not scale to other types. Walter's solution does work for all array types (but admittedly has other problems).

[+] chmike|16 years ago|reply

Making the size part of the type is a useful feature in some cases. The compiler can benefit from this information to optimize the data structure and its manipulation.

The exponential grow of CPU processing power has led to discard such optimizations in benefit of code simplicity. But hand held devices with limited energy, computing and storage capacity may put it back into perspective.

[+] kssreeram|16 years ago|reply

I feel the lack of a module system is the biggest mistake in C. It is tiresome to prefix every single public function: list_append, list_delete, hashmap_insert etc.

[+] unknown|16 years ago|reply

[deleted]

[+] coliveira|16 years ago|reply

I think the preprocessor is the biggest mistake. It was introduced to address issues of separate compilation in a simple way, but it generated more trouble than advantages.

[+] __david__|16 years ago|reply

I understand the issues with the pre-processor, but I still think C is better off with it than without it. I know it can be abused horrifically but it can also be abused in really nice and convenient ways. If you construct you macros right they can be useful or even wonderful. In that sense the pre-processor is very C-like.

The things that makes macros actually nice are some of the gcc extensions, like the ({ }) block expression syntax and typeof().

[+] agazso|16 years ago|reply

Macros can easily be abused, but the point of the preprocessor today is platform-dependent code generation. In higher level languages you can select with an if which branch gets executed, but at the end the platform-dependent code of that language runtime is written in C with ugly ifdef's.

[+] vorador|16 years ago|reply

Which issues has the preprocessor ?

[+] duairc|16 years ago|reply

I disagree. I've only recently started programming in C properly, having only programmed before in various dynamic scripting languages, Haskell and Java before. Macros are by far my favourite feature of C.

[+] rbranson|16 years ago|reply

I don't think arrays are "converted" to pointers. Arrays are simply a cleaner way of doing pointer arithmetic and allocating large(r) blocks of the stack. Nothing is lost in this "conversion." The array never knows it's own dimensions beyond the time you declare it. It's up to the developer to keep track of that.

[+] agazso|16 years ago|reply

Here is an example that caused me a few bugs.

  // define a new type called md5_t
  typedef char md5_t[33];
  md5_t g_md5;
  // here sizeof(g_md5) == sizeof(md5_t)
  
  void f(md5_t md5)
  {
    // here sizeof(md5) == sizeof(char*)
  }

The type information definitely lost inside functions.

[+] ori_b|16 years ago|reply

Arrays are converted to pointers on the first use. You can't ever use an array in C.

[+] giardini|16 years ago|reply

C's biggest mistake would have to be C++.

47 comments