top | item 26300199

Some Were Meant For C (2017) [pdf]

105 points| fractalb | 5 years ago |cs.kent.ac.uk | reply

186 comments

order
[+] thesuperbigfrog|5 years ago|reply
C and C++ code CAN be secure, but most of it is not. It is too easy to write or update C / C++ code so that it is no longer secure or has unexpected and unsafe results.

https://blog.regehr.org/archives/213 gives great insights into how undefined behavior in C and C++ can be difficult to reason about and cause problems.

The lack of bulletproof memory safety and easy-to-stray-into undefined behavior of C and C++ make it easy to create code that is difficult to fully grasp how it will behave, especially when optimizing compilers are used. The C / C++ code runs really fast, but there are hidden dangers lurking.

I don't doubt that C and C++ will be with us for a long time to come, but the growing use of Rust, Zig, Ada, and others show that better alternatives exist and that they will replace the use of C and C++ for many domains and use cases.

Edit: Downvotes? Did you read my whole comment? I am saying that C / C++ are not secure for real-world use cases.

[+] jerf|5 years ago|reply
This is one of those rare cases where I can say "It's 2021, and we know that's not true now." C and C++ can not be secure at scale without unreasonable amounts of effort. It can't be secure at any non-trivial scale through sheer discipline alone.

40 years ago, the case could be made. But C and C++ are not new languages, and the fact that just barely shy of no one can demonstrate the existence of secure C or C++ code bases without staggering levels of effort put into that process is data now, not just anecdote.

(And let me emphasize the effort as my yardstick. Writing truly secure code is arguably something nobody has ever done at scale in any language... but C and C++ are certainly unique in the sheer level of effort it takes poured in to them to even match what a number of other languages come with out of the box, let alone exceed them. If you aren't using some very high quality and fairly expensive tools like Coverity on a routine basis, you aren't even close.)

[+] kzhukov|5 years ago|reply
Security is not a feature of the language or tool. Neither Rust nor C++ are fully secured even though the former could find more memory safety problems at compile time (but not all of them).

Security is the process. It contains continuous risk assessment, penetration testing, fuzzing and using various other tools throughout the product development to eliminate attack vectors. Only then you could build a secured product. Just rewriting everything in Rust won't make it.

[+] icandoit|5 years ago|reply
If you want to do something about it look into UBSan. Turn vague concerns into bugs, and then into commits :).

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

UndefinedBehaviorSanitizer (UBSan) is a fast undefined behavior detector. UBSan modifies the program at compile-time to catch various kinds of undefined behavior during program execution, for example:

- Using misaligned or null pointer

- Signed integer overflow

- Conversion to, from, or between floating-point types which would overflow the destination

GCC has similar features.

[+] jandrewrogers|5 years ago|reply
C and C++ are not the same case. Modern C++ supports fairly strict and exhaustive type safety that can robustly evaluate many types of safety at compile-time, if you choose to use it to its full capability. C doesn't even have these facilities.

Anecdotally, I haven't seen an open source C++ code base that is uses type safety to that extent but it is certainly possible.

[+] Arch-TK|5 years ago|reply
Rust, Zig and Ada code CAN be secure, but most of it is not, it is too easy to write or update Rust, Zig or Ada code so that it is no longer secure or has unexpected and unsafe results.

The lack of formal verification features and easy-to-stray-into logic errors in Rust, Zig and Ada make it easy to create code that is difficult to fully grasp how it will behave, especially when large projects are concerned. The Rust, Zig and Ada code runs really fast and is usually memory safe but there are hidden dangers lurking.

I don't doubt that Rust, Zig and Ada will become more popular in time to come, but formally verifiable languages such as Verifiable C or Spark are actually "safe" in some meaningful sense of the term instead of giving everyone a false sense of "safety" in the form of memory safety features as if memory safety errors were somehow the only class of security critical error.

[+] bawolff|5 years ago|reply
> C and C++ code CAN be secure

Anything can be secure (and conversely anything can be insecure). The theoretical potential doesn't matter because real life is never the theoretical best case. What matters is the overall risk (is liklihood * how bad < benefit?)

[+] liquidify|5 years ago|reply
I think they can easily be made secure. You just have to be willing to strongly type everything. It doesn't mean it would be conventional or easy, but it would eliminate most type safety issues that crop up during refactoring.
[+] Koshkin|5 years ago|reply
I have a suspicion that “safe” and “secure” are two different things.
[+] cogburnd02|5 years ago|reply
> C / C++ are not secure for real-world use cases.

Is the kernel Linux not a real-world use case?

[+] nec4b|5 years ago|reply
Of course they are secure. What language do you think was used for developing software for airplanes, cars, medical equipment and so on. C can be formally verified while rust currently can't be.
[+] coliveira|5 years ago|reply
Undefined behavior is not the problem that people make it to be. First of all, undefined behavior is well understood by compilers, in fact it is exactly exploited by compilers to make code run faster. The only thing you need to solve UB is to ask compilers to stop exploiting it (which normally can be done by reducing optimization). And of course you can rewrite your code to stop relying on UB. Despite all the complains, I have never seen code suffering from UB that couldn't be fixed.
[+] jvanderbot|5 years ago|reply
I've entered a mid-life zen w.r.t. languages. Most important is the people and irreplaceable knowledge they have in their heads, about the tricks, methods, and environments they've worked in. Languages correlate with that, and so are a semi-useful indicator of past skills. You want a person to write the bootloader for your Mars helicopter? You hire a C programmer most likely, not a C# programmer, but who knows? You want a person to bootstrap your image processing pipelines for your scientists? Maybe Python? Maybe? This area is much more loose.

If a valuable tool or method is expressed in (or even only expressible in) a particular language, then so be it. Often, there are many more choices than people believe, and what is right for the person, so long as it serves the organization or need appropriately, is fine by me.

A language is a tool. Most languages do pretty much the same thing. Most languages' ecosystems, application adoption, and developers are much more important than the languages' seatbelts and headlights.

[+] derekp7|5 years ago|reply
What is also important is the ecosystem that comes along with the language. Modern languages have a wide ranging list of libraries and plugins for them which make certain programming tasks easier. But that also means you have an ever shifting stack that is required to support them. This became apparent to me when trying to get some infrastructure software to play nice with some of the older systems we need to keep around. There were some nice Python based solutions that I had to reject because the dependencies didn't exist for some older RHEL installations we have (yes, we still need to keep a hand full of RHEL 4.x systems around because management doesn't want to tell customers "no we won't support you unless you upgrade to our latest product that works on newer OS releases". So for backup and management solutions, I have to stick with tools that can be easily compiled on the older environments.

Another example, one of our architects at work is a strong supporter of Apple, and wanted me to look into Swift. Well at the time you could get Swift for Ubuntu, but couldn't for any version of RHEL (that has finally changed now though). So again, writing my code in plain C was more of a win.

[+] qznc|5 years ago|reply
I disagree that languages "pretty much the same thing". However, I do agree that language choice is overrated. Other factors, e.g. which one you know better, weigh heavier.
[+] RcouF1uZ4gsC|5 years ago|reply
> Most obviously in C, we note that a malloc() implementa- tion is usually written in C—or rather, in a subset of C that lacks malloc() since malloc() is mandated by the C standard.

Not always. Google’s tcmalloc is actually written in C++.

I actually would have agreed with a lot of this in 2017. However, in the past several years, I think Rust has been a game changer. It can interface with the C ABI. It can access the same low-level abstract machine that C can. It doesn’t have garbage collection or virtual machines. And it provides memory safety out of the box. In addition, features such as strong types and pattern matching help less logical bugs as well (like the compiler checking that you did not forget one arm of an Enum).

This is born out now that a lot of security facing software is starting to do at least part of their internals in Rust, where they had been in C before.

I think Rust is and will be even more in the future a game changer in how we write the foundation programs and libraries that the computing world is built on.

[+] bachmeier|5 years ago|reply
I remember when this paper came out, I tweeted at the author about D's "Better C" mode since he used that very term. It really is a better C, because C is almost a subset of D. You get nice features like array bounds checking, but no runtime, no garbage collector, etc. It's a good choice for those that prefer to stick with C but wish there was a 2021 upgrade.

https://dlang.org/spec/betterc.html

[+] pharke|5 years ago|reply
What's the developer experience like in D? I keep looking at it from time to time but haven't taken the time to learn it (even though I've spent time learning a lot of new languages) I was never sure if there was a big enough community around D to make it worthwhile but it seems to have a lot of the features I want.
[+] ducktective|5 years ago|reply
Who are the designers/institution behind D and why they have not promoted it as much as newer alternatives?
[+] 0xdeadfeed|5 years ago|reply
NASA just sent a rover on Mars using software written in C. Meanwhile some Rust fanatics are busy telling everyone how it doesn’t work.
[+] mhh__|5 years ago|reply
Wat

The types of analysis and programming practices used to send stuff to Mars is beyond what Rust, or D, or any other safer-systems-language tries to do. It's not that simple.

These types of projects effectively need to prove the absence of bugs using formal verification and very extensive testing. Surprise surprise, C makes it extremely expensive and theoretically difficult too.

For example: NASA wrote this project https://github.com/NASA-SW-VnV/ikos which uses abstract interpretation and would catch bugs in practically any language.

[+] barongrounds|5 years ago|reply
The Rover is not connected to the internet.

Do you know what subset of C NASA limits itself to? Or hw architecture? The rigour of their testing? Should all C developers follow the same restrictions as NASA?

[+] shipp02|5 years ago|reply
Rovers usually have a timeline of 7-8 years for desigining and building. Rust had not hit 1.0 at the time Nasa probably started designing the rover.

So any indication on what Nasa would use on its Rovers has to be taken from projects that start from the point when Rust released 1.0

[+] rurban|5 years ago|reply
According to my information the rover software is in C++. The OS is in C. The C++ classes are mostly autogenerated by python.
[+] adyavanapalli|5 years ago|reply
The author's conclusion:

I have argued that C’s enduring popularity is wrongly ascribed to performance concerns; in reality one large component of it (the “application” component) owes to decades-old gaps in migration and integration support among proposed alternatives; another large component of it (the “systems”component) owes to a fundamental and distinctive property of the language which I have called its communicativity, and for which neither migration nor integration can be sufficient. I have also argued that the problems symptomatic of C code today are wrongly ascribed to the C language; in reality they relate to its implementations, and where for each problem the research literature presents compelling alternative implementation approaches. From this, many of the orthodox attitudes around C are ill-founded. There is no particular need to rewrite existing C code, provided the same benefit can be obtained more cheaply by alternative implementations of C. Nor is there a need to abandon C as a legitimate choice of language for new code, since C’s distinctive features offer unique value in some cases. The equivocation of “managed” with “safe”implementations, and indeed the confusion of languages with their implementations, have obscured these points. Rather than abandoning C and simply embracing new languages implemented along established, contemporary lines, I believe a more feasible path to our desired ends lies in both better and materially different implementations of both C and non-C languages alike. These implementations must subscribe to different principles, emphasising heterarchy, plurality and co-existence, placing higher premium on the concerns of (in application code) migration and interoperation, and (in the case of systems code) communicativity. My concrete suggestions—in particular, to implement a“safe C”, and to focus attention on communicativity issues in this and any proposed “better C”—remain unproven, and perhaps serve better as the beginning of a thought process than as a certain destination. C is far from sacred, and I look forward to its replacements—but they must not forget the importance of communicating with aliens.

[+] AlbertoGP|5 years ago|reply
Relevant to recent discussions, even if it was published in 2017.

It is quite more elaborate than other publications I’ve seen mentioned in those discussions.

I’ll quote section 6.2, “What is Safety Anyway?”:

> I have learned to enjoy provoking indignant incredulity by claiming that C can be implemented safely. It usually transpires that the audience have so strongly associated “safe” with “not like C” that certain knots need careful unpicking.

> In fact, the very “unsafety” of C is based on an unfortunate conflation of the language itself with how it is implemented.

> Working from first principles, it is not hard to imagine a safe C. As Krishnamurthi and Felleisen [1999] elaborated, safety is about catching errors immediately and cleanly rather than gradually and corruptingly. Ungar et al. [2005] echoed this by defining a safety property as “the behavior of any program, correct or not, can be easily understood in terms of the source-level language semantics”—that is, with a clean error report, not the arbitrary continuation of execution after the point of the error.

[+] microtherion|5 years ago|reply
The paper starts with a series of snippets designed to show off the unique abilities of C that are chock full of undefined behavior, and then argues that C can be used safely if only programmers and compiler writers were to agree to stay away from undefined behavior. I'm getting mixed messages here…

And the title, to me, evokes William Blake: "Some are Born to Endless Night".

[+] coliveira|5 years ago|reply
The author raises an important issue here: many people are lead to believe that using C is inherently unsafe. That's not true, and many of the most secure systems in the world were written in C. The other direction also doesn't work: software written in languages like Java can be effectively unsafe.
[+] pjmlp|5 years ago|reply
One of the most secure OSes is ClearPath MCP, zero lines of C on its kernel, rather NEWP.

Azure Sphere, Solaris and latest versions of iOS all rely on some variation of hardware memory tagging to tame C exploits.

[+] bitwize|5 years ago|reply
Once again.

We know from 40 years of discovering memory-related vulnerabilities in even the most carefully written, rigorously tested C programs that writing safe C is intractable for real, human software engineers. So yes, C IS INHERENTLY UNSAFE. If you claim otherwise you clearly haven't been paying attention to what's going on.

[+] jxy|5 years ago|reply
> A final interesting property of this code is that its be- haviour is undefined according to the C language standard. The reason is that it calls memcpy() across a range of memory comprising multiple distinct C objects, copying them all into memory-mapped storage in a single operation.

What's wrong with memcpy here? As long as dst and src are both non-zero and the ranges of memory are not overlapping, the behavior of memcpy is well defined.

[+] rurban|5 years ago|reply
Pointers into a copied region are invalidated. Only GC languages do such memory copying safely.

It's also insanely slow with gcc, compared to clang. Like factor 1000 with some compile-time constants.

[+] MaysonL|5 years ago|reply
Structure alignments…
[+] dleslie|5 years ago|reply
All I really want added to C is Zig's comptime.
[+] zokier|5 years ago|reply
Is the second example (the auxv stuff in section 5.1) invoking undefined behavior in here "at_null−>a_type == AT_NULL"? As far as I understand, in C you really generally can not pull out valid pointer out of thin air like author is doing there. Isn't that the whole idea behind "pointer provenance"?
[+] khuey|5 years ago|reply
It's not pulled out of thin air, it's constructed from `environ`.
[+] ciarcode|5 years ago|reply
Can someone explain me why do we use C for writing code for electronic control unit of a vehicle motor if it is so unsafe? It is true that ECUs are programmed with code generated though model based design,but there can be some parts manually programmed. Maybe this is why they use only a subset of C (Misra C)
[+] GeorgeTirebiter|5 years ago|reply
I've written code for ECUs. Misra C forces a straightjacket on C to keep away from dark corners, and rather enforces a 'bland' C style that is easy for some other engineer to understand. At first, one complains about the details; then, they become built-into-your-brain 'macros' so they are no longer (much of) an impediment.

There is one other reason, and that's until recently, auto-qual MPUs with fancy floating-point units (or any floating point units) were very rare; hacks such as Qm.n notation were required to do anything semi-fancy with trig etc. This was true even 5 years ago, although I would hope by now 'decent' auto-qual parts that are cheap enough exist.

Lastly, there are a boatload of requirements for auto software; you have to fail-safe as your power is going away (e.g. a crash is happening). You need to have get reset to any value, and your CPU needs to detect something is wrong and reset itself. There are different failure tests for different subsystem; 'body' electronics isn't quite as stringent as propulsion.

There is also a misra C++ spec; I was unsuccessful getting even a pilot project with C++, as it's also substantially simplified in the 'legal' subset; and is rather nicer than C in many ways. But.... C is going to be with us for decades more, I think.

[+] ironman1478|5 years ago|reply
Probably because of

1. Inertia. its already being used, so why not keep using it? I can't prove this, but it feels true

2. Lots of microcontroller vendors provide tooling for it. You are basically guaranteed to be able to run C on whatever micro controller you want. Even if you don't get a standard library, you can implement your own if you need to. Languages like C++ have large runtimes and are hard to port to lots of platforms

3. Its an "easy" language to use for people who work on these devices and the problems of an ECU are not the problems of a database for example. I've worked on large enterprise databases and now I work on cameras for a car-ish company. The scale is just different and you simply don't need as many features as in C++, haskell, rust, python. The hard part on a small embedded system is not the coding per se, its the algorithms. A lot of these devices work at fixed rates on a timer (or respond to a very periodic interrupt), have a well understood amount of work to do in a fixed window of time, and then go back to sleep until the next job is queued. Memory safety won't help you when critical failure is defined as being late to respond. Would I love to have total memory safety at the moment? Totally! But its the least of my concerns at the moment and if my toolchain doesn't support it, then oh well we just have to code to the standards of misra and follow standards like iso26262.

I should say, writing C on a large project sucks. I've seen it before and I personally hate it (large portions of the database were in C). Its just too complicated and while C++ isn't perfect, having things like a destructor and templates are really nice and just makes the problems tractable.

[+] cozzyd|5 years ago|reply
Usually in embedded development you don't ever want to use dynamic allocation. Very few languages allow you to avoid dynamic allocation.
[+] nec4b|5 years ago|reply
I would like to mention that C is able (certified) for writing safety critical code while rust isn't. Therefore critical systems for aerospace, auto industry and other special industries are mostly written in C.
[+] coliveira|5 years ago|reply
I believe the main issue at play here is a paradox in software engineering. The paradox is this: safe languages are more useful on complex projects that in simple projects, but complex projects suffer more from performance degradation and system integration issues when these languages are used. Put from the other side, C is perfectly fine to write short pieces of code, but despite its problems it may be the only reasonable language to write large pieces of system software (I'm including C++ here as a "kind of" of C, just like Objective-C).
[+] vmchale|5 years ago|reply
> safe languages are more useful on complex projects that in simple projects, but complex projects suffer more from performance degradation

I don't think that's true. C is often slower than C++ because of how inlining works, plus some domains (e.g. compilers) it's best to just use a GC language from the get-go.

[+] xianwen|5 years ago|reply
But is there some guides or books that teach people write safe C codes? Is writing safe C codes possible?
[+] FpUser|5 years ago|reply
>"...I use because I’m stuck with it; I use it for positive reasons. "

I absolutely agree with that statement. When I do firmware for small MCUs I feel big fat zero need for any other language.

[+] jqcoffey|5 years ago|reply
Isn't OpenBSD written in C? w.r.t. "unsafe", just sayin'.
[+] ncmncm|5 years ago|reply
The article mentions C++ three times, all in contexts equating it with C. But C++, as it is coded today, is a very different language from C, and does not suffer from the problems that make C an extremely poor choice for starting any new project that might matter.

All the article's arguments for C apply substantially moreso to C++. Thus, the article leaves us with no objectively plausible reason ever to code in C, except where artificial constraints mandate it, or where merit doesn't matter. (I leave to the reader to decide where Linux and BSD kernels fit in that.)

There is especially no excuse for systemd to be coded in C.