Lack of array bounds checking is not a problem with C.
It is a problem with our hardware.
Introducing bounds checking without introducing a penalty on array access time is impossible on our "C machines".
C/C++ are often thought of as "close to the metal" - but they are close to particular varieties of metal - those designed to run C/C++. We arrived at them through historical accident. There are many other ways to build a computer - and it is not entirely obvious that a "C architecture" is necessarily the simplest or most efficient:
That a language which is "close to the metal" is braindead is solely a consequence of braindead metal.
The "C architecture" is a universal standard, to the extent that it has become the definition of a computer to nearly everyone. This is why you will never find the phrase "C architecture" in a computer architecture textbook. And yet it is a set of specific design choices and obsolete compromises, to which there are alternatives.
Is there another place that buffer overflows occur than in the char* with no bounds checking? If not, this single fact is the one that leads to so many of the software vulnerabilities in the wild.
Thank god for this mistake, this mistake makes c what it is good at: at low level programming. It just pass directions between functions. Light and fast,no abstractions.
I love it, a way of making assembler like coding but multiplatform.
If I want high level programming I will program in another language but when you want machine control you have c without all the bloat.
There would be little lost from having to specify &arr[0], rather than having array typed arr degrade into a pointer directly, but a huge amount to be gained - some very much needed help with tracking array sizes.
C combines the power and performance of assembly language with the flexibility and ease-of-use of assembly language.
However, I was amazed to find that modern assembly language (since I was last in the game 25 years ago) has many high-level concepts in it (structures, loops, conditions etc), and looks... suspiciously... C-like.
But you're quite right about portability. Although C is famously not perfectly portable (int sizes, all those #defines - just some of the issues Java tackled), it is a hell of a lot more portable than an actual assembling language. :-)
I'd say using null-terminated strings rather than pascal style length embedded strings is C's biggest mistake. Responsible for so many inefficiencies (strlen is O(n) instead of O(1) as it should be) and, worse yet, so many incredibly serious security vulnerabilities.
All to avoid having to incur a 1-3 byte per string overhead or figuring out how to efficiently work around a 255 character limit.
Other than strlen, string operations can be faster with null-terminated strings. Plus, there are other benefits:
If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.
Instead of having to increment and compare both a counter and a pointer (or store a final pointer for comparison) in string-manipulation operations, you just increment the pointer. It makes for very concise loops in strchr, etc.
Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).
Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.
People are permanently trying to "fix" C, but C has nothing to fix.
It is a limited language, both by the constraints at the time of its creation, but also by the problem space where it has been used over the years. And that's how it should be.
C is part of an ecosystem of languages, it doesn't have to be changed to acommodate the latest fads or to fix problems that nevertheless never stopped it from being widely used for decades.
If C doesn't fit a purpose, don't use it. You don't even have to stray too far, since there are a few languages that basically are just C with extras.
We have been awash in buffer overflows and other, similar errors (printf strings come to mind) that are actually impossible in a safer language for years. SQL injection can happen in a safer language but you can't take over the web server by doing them. There is nothing fundamental about system languages that requires unsafe array operations. This is a flaw, and it is a flaw of C specifically and a flaw inherited by many C-descended languages. This is not some ivory tower thing that was discovered after C was designed; it was apparent even at the time (although Pascal's fix was pretty bad, variable length arrays fix it neatly). There are compiler articles from the late 70s and early 80s pointing out how even a naïve compiler could easily optimize out bounds checking in most operations!
People are trying to create a language that fits the niche that C is supposed to fill in today's world but just barely misses the mark (fast, low-level, but still sane and more or less modern). To a lot of people it looks like the easiest way to create a language that fills that niche is to fix the bugs in C rather than create something new from scratch.
When I was reading this, I thought "Nooo! Don't make the size of an array part of its type!" That has been rightfully shown as a very bad idea by Brian Kernighan, see http://www.lysator.liu.se/c/bwk-on-pascal.html
Luckily the proposal is about passing a 'fat pointer', really a pointer and a length. I did that often in my C programs too: int process(char *buf, int buflen);
Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just syntactic sugar.
I don't think Walter suggested having the array size part of the type (static array types).
He suggested using "fat pointers" -- pointers along with their extent. This is similar to how many Pascal compilers treat the type "String".
Kernighan mentions Pascal strings in his article but claims the solution does not scale to other types. Walter's solution does work for all array types (but admittedly has other problems).
Making the size part of the type is a useful feature in some cases. The compiler can benefit from this information to optimize the data structure and its manipulation.
The exponential grow of CPU processing power has led to discard such optimizations in benefit of code simplicity. But hand held devices with limited energy, computing and storage capacity may put it back into perspective.
I feel the lack of a module system is the biggest mistake in C. It is tiresome to prefix every single public function: list_append, list_delete, hashmap_insert etc.
I think the preprocessor is the biggest mistake. It was introduced to address issues of separate compilation in a simple way, but it generated more trouble than advantages.
I understand the issues with the pre-processor, but I still think C is better off with it than without it. I know it can be abused horrifically but it can also be abused in really nice and convenient ways. If you construct you macros right they can be useful or even wonderful. In that sense the pre-processor is very C-like.
The things that makes macros actually nice are some of the gcc extensions, like the ({ }) block expression syntax and typeof().
Macros can easily be abused, but the point of the preprocessor today is platform-dependent code generation. In higher level languages you can select with an if which branch gets executed, but at the end the platform-dependent code of that language runtime is written in C with ugly ifdef's.
I disagree. I've only recently started programming in C properly, having only programmed before in various dynamic scripting languages, Haskell and Java before. Macros are by far my favourite feature of C.
I don't think arrays are "converted" to pointers. Arrays are simply a cleaner way of doing pointer arithmetic and allocating large(r) blocks of the stack. Nothing is lost in this "conversion." The array never knows it's own dimensions beyond the time you declare it. It's up to the developer to keep track of that.
// define a new type called md5_t
typedef char md5_t[33];
md5_t g_md5;
// here sizeof(g_md5) == sizeof(md5_t)
void f(md5_t md5)
{
// here sizeof(md5) == sizeof(char*)
}
The type information definitely lost inside functions.
[+] [-] asciilifeform|16 years ago|reply
It is a problem with our hardware.
Introducing bounds checking without introducing a penalty on array access time is impossible on our "C machines".
C/C++ are often thought of as "close to the metal" - but they are close to particular varieties of metal - those designed to run C/C++. We arrived at them through historical accident. There are many other ways to build a computer - and it is not entirely obvious that a "C architecture" is necessarily the simplest or most efficient:
http://www.loper-os.org/?p=46
That a language which is "close to the metal" is braindead is solely a consequence of braindead metal.
The "C architecture" is a universal standard, to the extent that it has become the definition of a computer to nearly everyone. This is why you will never find the phrase "C architecture" in a computer architecture textbook. And yet it is a set of specific design choices and obsolete compromises, to which there are alternatives.
[+] [-] shadytrees|16 years ago|reply
http://c-faq.com/aryptr/index.html
[+] [-] weaksauce|16 years ago|reply
[+] [-] xcombinator|16 years ago|reply
I love it, a way of making assembler like coding but multiplatform.
If I want high level programming I will program in another language but when you want machine control you have c without all the bloat.
[+] [-] barrkel|16 years ago|reply
[+] [-] 10ren|16 years ago|reply
However, I was amazed to find that modern assembly language (since I was last in the game 25 years ago) has many high-level concepts in it (structures, loops, conditions etc), and looks... suspiciously... C-like.
But you're quite right about portability. Although C is famously not perfectly portable (int sizes, all those #defines - just some of the issues Java tackled), it is a hell of a lot more portable than an actual assembling language. :-)
[+] [-] InclinedPlane|16 years ago|reply
All to avoid having to incur a 1-3 byte per string overhead or figuring out how to efficiently work around a 255 character limit.
[+] [-] nitrogen|16 years ago|reply
If you want to turn some arbitrary data into a string, just put a 0 byte where you want it to end.
Instead of having to increment and compare both a counter and a pointer (or store a final pointer for comparison) in string-manipulation operations, you just increment the pointer. It makes for very concise loops in strchr, etc.
Tokenizing a string is just a matter of throwing down 0 bytes where the tokens are (as strtok does).
Passing a tailing subset of a string to a function is as easy as adding an integer to the string pointer, rather than requiring a memcpy and length calculation.
[+] [-] abrahamsen|16 years ago|reply
[+] [-] WalterBright|16 years ago|reply
The "fat pointer" approach has been used in D for nearly 10 years now, and has proven itself to be very effective.
[+] [-] CrLf|16 years ago|reply
It is a limited language, both by the constraints at the time of its creation, but also by the problem space where it has been used over the years. And that's how it should be.
C is part of an ecosystem of languages, it doesn't have to be changed to acommodate the latest fads or to fix problems that nevertheless never stopped it from being widely used for decades.
If C doesn't fit a purpose, don't use it. You don't even have to stray too far, since there are a few languages that basically are just C with extras.
[+] [-] gchpaco|16 years ago|reply
[+] [-] InclinedPlane|16 years ago|reply
[+] [-] Luyt|16 years ago|reply
Maybe this fix to C's Biggest Mistake, a.k.a. the 'fat pointer', is just syntactic sugar.
[+] [-] nimrody|16 years ago|reply
He suggested using "fat pointers" -- pointers along with their extent. This is similar to how many Pascal compilers treat the type "String".
Kernighan mentions Pascal strings in his article but claims the solution does not scale to other types. Walter's solution does work for all array types (but admittedly has other problems).
[+] [-] chmike|16 years ago|reply
The exponential grow of CPU processing power has led to discard such optimizations in benefit of code simplicity. But hand held devices with limited energy, computing and storage capacity may put it back into perspective.
[+] [-] kssreeram|16 years ago|reply
[+] [-] unknown|16 years ago|reply
[deleted]
[+] [-] coliveira|16 years ago|reply
[+] [-] __david__|16 years ago|reply
The things that makes macros actually nice are some of the gcc extensions, like the ({ }) block expression syntax and typeof().
[+] [-] agazso|16 years ago|reply
[+] [-] vorador|16 years ago|reply
[+] [-] duairc|16 years ago|reply
[+] [-] rbranson|16 years ago|reply
[+] [-] agazso|16 years ago|reply
[+] [-] ori_b|16 years ago|reply
[+] [-] giardini|16 years ago|reply