Rust has some very desirable properties to me. Writing Rust programs from scratch is not as scary as I've heard of from the internet either. The documentation is excellent, the compiler diagnostic messages are very helpful and the notorious borrow checker didn't stand in my way that much. And I love Cargo and Cargo.io. I have some projects where Rust is the saner choice than Go or other GC based languages.
That said, there are actually drawbacks of Rust compared with Go, IMHO. When facing a moderately large project written by others, the ergonomics for diving into the project is not as smooth as Go. There is no good full-source code indexer like cscope/GNU Global/Guru for symbol navigation across multiple dependent projects. Full text searching with grep/ack does not fill the gap well either since many symbols, with their different scopes/paths, are allowed to have the same identifier without explicitly specifying the full path. That makes troubleshooting/tracing a large, unfamiliar codebase quite daunting compared with Go.
Hmm, I've had a very nice experience using Rusty Code in VS Code. Some useful refactoring functionality is missing for sure, but a lot of that will become possible quite shortly from RLS (Rust Language Server, a la how TypeScript works in VSC), and if your preferred editor has support for the language server spec (it's a open source common spec, not specific to Rust), it will support it at parity, too.
Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening? I do agree that C has issues (though in my opinion neighter Rust nor Go address almost any of them) i just don't understand why are buffer overflows such a huge problem in C when the same thing is going to come up when trying to work with memory in Rust.
> Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening?
The CVE database. Just because you 'can' write such an array implementation doesn't mean you will, doesn't mean your third party libs will, doesn't mean any of your legacy code uses it, and certainly doesn't mean you will properly test said array implementation correctly.
The number of mitigations added to C compilers and OSes dealing mostly with C and C++ code. ASLR, W^X, /GS, -fstack-protector-all, AddressSanitizer, ... - note the lack of similar tools, or demand for them, for, say, JavaScript - despite it enjoying a similar ubiquity.
I ask this in bad faith: I encourage you to share a single nontrivial codebase which actually creates the abstraction you've described and religiously adheres to using it throughout. As to why this is in bad faith: I'm definining "nontrivial" here to mean using 3rd party APIs - which will operate on C style arrays, not your project specific safe wrappers - and thus by definition won't be "religiously" sticking to said abstractions when using said APIs. By these definitions, the codebase I'm asking for doesn't exist - by definition. Even relaxing the "third party" rule, I haven't actually worked on a nontrivial C or C++ codebase without buffer overflow problems.
Now, e.g. Rust will have the same problems when interacting with C APIs - and nontrivial programs will end up doing so eventually. However, by virtue of the language itself embracing safe-by-default, you're less likely to run into the same problems when consuming Rust APIs.
You can also use third party static analysis tools to ensure you're using a "safe C subset" (such as MIRSA C), but "nobody" does that.
The lack of generics means your array implementation is either going to either:
- be implemented with macros and token pasting, and result in a ton of mental overhead because you'll have a pile of types like array_foo for an array of `foo`s, and array_bar for an array of `bar`s, along with a pile of corresponding `foo * array_foo_get(array_foo, size_t)` and `bar * array_bar_get(array_bar, size_t)` functions.
- or, have a runtime cost and lose type safety by storing void* and casting when accessing.
The first case is even worse than it sounds: e.g. I don't know how you handle arrays of types with spaces in them (like `unsigned char`, or `struct bar`) with a macro. And, we haven't even thought about const correctness yet, which would probably require having const_array_foo, const_array_bar (etc.) types defined too.
(And, of course, these only solve one facet of the problems with C's pointers: there's no way to defend against use-after-free or dangling pointers.)
Because your 'safe' implementation will certainly have a performance cost, and won't be the default. This is why, despite C++ providing std::array, you'll still find buffer overflows in C++ code. C++'s std::array provides the safe 'at' function but you're opting into a performance penalty and it's not the more familiar [] syntax.
Rust arrays/ vectors are safe-by-default. To use the unchecked, unsafe version requires using the 'unsafe' keyword.
let v = vec![0, 1, 2];
unsafe {
let x = v.get_unchecked(5);
}
This means you can basically grep audit for vulnerabilities, and the above code should be very rare.
> i just don't understand why are buffer overflows such a huge problem in C when the same thing is going to come up when trying to work with memory in Rust.
False. Buffer overflows in C can overwrite the program's memory, so it can be hijacked and supplanted with the attacker's code. This cannot happen in Rust (unless unsafe code has the vulnerability), or any memory safe language.
Sure you can implement a safe array/buffer abstraction and use it in your C programs that abort on invalid indexing. Now how many actually do this? Very few given the prevalence of C programs on vulnerability disclosure lists.
Obviously, one can program C to do anything, and write all the provably safe abstractions wished. But, that's not really the point. The point is that doing such is not the default. It requires engagement and knowledge of the programmer, especially on distributed projects with loose communication, such as many open source projects. And it only takes one programmer mistake to bring the whole house of cards down.
Why allow programmers to make mistakes? That was fine in the 70's when resources for compiler execution were limited. I don't see any reason for it today.
I mean, just look at the underhanded C contestants and especially winners for ways in which your program can completely blow up for extremely subtle reasons.
1. Buffer overflows aren't considered the most insidious issue in C nowadays. That award would probably go to use after free, which is not so easy to fix.
2. In C, it is easier and faster to do the wrong thing. Compare "char buf[256]; strcpy(buf, foo); ..." to "array_t buf = array_create(strlen(foo) + 1); strcpy(buf.ptr, foo); ... array_destroy(buf);"
3. Buffer overflows do not in fact come up routinely in Rust the way they do in C.
> Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening?
That it's not a by-default and forced language-feature and that most developer aren't going to spend those 10 minutes when they need an array.
They'll just use the language-provided array-implementation instead. Which in C is very, very unsafe.
Compiler vendors have been resistant towards putting in such features. Bounds checking slows things down, and the performance race is very much a thing in C compiler implementations -- a compiler that can deliver a few percentage points better code can be a big win to teams working on compute heavy problems. C11 has Annex K which has a lot of safety features, like memory safe arrays. Unfortunately, none of the vendors have implemented it even as an option. Which is a shame because it would solve a lot of problems, with requiring minimal rewrites for a lot of code.
I think it breaks down when that array has to interact with system libraries or the C stdlib in any way.
A lot of C string functions have weird gotchas related to terminators and sizes, and any IO you're doing will involve raw buffers being passed into or out of a system IO function that doesn't understand custom array types.
Its useful to be able to remove safety checks for speed. I have a C++ code where all data is in array objects. Bounds checking is a compile time option, and it makes the overall code 2X slower. I can do testing with bounds checking on, but once it gets to a supercomputer that needs to be removed. Address sanitizing by compilers is an even more effective tool for this, especially for C. Bounds checking is critical for security, but if you're only concerned with correct execution then a segfault is not much different from an exception.
technically you probably could limit yourself to using a "safe" subset of C - basically no pointer arithmetic, no strcopy() etc - but that would defeat the purpose of using C in a first place.
Or perhaps just some helper functions in C that wrap array and pointer allocation/access to provide sanity checks. Seems like moving to a new language is rather extreme....
A 62 KLOC secure NTP server seems like an ideal project for this kind of experiment. I imagine it would be self-contained enough to actually use Rust or Golang instead of just treating them like FFI scripters.
> One such cleanup: we’ve made a strong start on banishing unions and type punning from the code. These are not going to translate into any language with the correctness properties we want.
Really? This sounds like idiomatic rust to me (heavy with enums).
C unions have no discriminant. But yeah it's a pity—I'd try to hack up discriminated unions then. Or convert to Rust unions and then enums and remove unsafety.
Outside of the language war bubble it's really great to see a post like this. Practical concerns, reasonable advantages/disadvantages of each language, a real project dealing with real timelines. Thanks!
Was going to say the same. In the past ESR has come across as a patently arrogant gun maniac, but the first part of this post is great for all the reasons you mentioned.
(What irritated me though was the switch to first-person narrative at the end).
I'm excited to see where this goes because it could go a long way towards providing concrete data for the large "work to replace old infrastucture C code with (Rust||Go||Modern C++)" discussion that has been taking place.
More data points will help to inform discussion, or at the very least add structure to the flame wars.
This is literally the only thing I can think of that "NTPsec" can do that would result in the project having any relevance. I understand why some very specific sites are chained to the ntpd codebase, but the vast, overwhelming majority of the ntpd deployed base not only isn't tied to ntpd, but also doesn't need 99% of what ntpd does. Trying to "secure" that codebase always seemed to me like a very silly windmill to tilt at.
I wish more mention of D would happen. It is compatible with C and C++ libraries and features GC without sacrificing the good things of C and C++. I always loved the idea of Rust and Go but they are nowhere near C or C++ where it matters to me. D fits the bill, otherwise I just use Python. I like being able to design software in my own way as opposed to being told how to do it.
How much concern would non-standard architecture support matter for ntp? Given how many architectures Linux supports, I would think that C would still be the best choice, until these other languages gain support for those missing architectures.
Or perhaps it's a good opportunity for a language which offers transpilation with ANSI C as the target?
I would start by getting the code to compile with g++, then begin migrating the dangerous C constructs to safe C++ constructs. IMO, that would be a safe, reasonable thing to do.
After reading this post the idea of a C-to-C translator that injects bound checking, etc. comes to mind. Such translator could be used by OS distributions to provide safety in the least intrusive way and possibly completely automatically for many C codebases they have in their repositories. Translating into Go or Rust, on the other hand, cannot scale beyond some individual projects, that decide to undertake such efforts. Mainstream C compilers could implement safety features too, but realistically it cannot happen, as it's not something most people care about. So, C-to-C translator might be a best bet with the most impact.
It's not yet ready for primetime, but Scala Native (http://scala-native.readthedocs.io/en/latest/) might just make a splash in the systems space. I don't think it has anything like ownership yet, but I wouldn't be surprised if it eventually develops that capability. I think you can get it to run without GC, too, but using C Stdlib memory management. Although, that largely defeats the memory-safety.
Just throwing it out there as something to keep an eye on!
Looking at the current new and coming languages
I would take a hard look at NIM. It may be is not there yet
but it looks highly appealing, is as fast as the often
mentioned Rust and compiles significantly faster.
http://nim-lang.org/
I didn't understand why rust and go are natural alternatives to C. Wouldn't C++ be a more natural option? (Despite the fact that both go and rust are developed by third party companies)
I did a lot of C++ years ago, so maybe things have changed since then, but I think Rust and Go addressed a lot of the design flubs of C++.
My experiences getting things to compile across gcc and visual c++, dealing with strings (especially Microsoft's WCHAR), reliable integer sizes (pre stdint.h), and debugging templates were not things I would wish on anyone.
Re-doing some of my side projects in Go and Rust was a lot more enjoyable. I could focus on what I was doing instead of trying to work around deficiencies in the language and its libraries.
My experience is that using C for main(argc,argv)-style programs is rarely a problem. Trouble comes when using long running single-address space containers for service-like abstractions with pthreads etc.; in that kind of environment, malloc() and co. don't cut it because even if you get memory allocation right, unless using pooled memory allocators, memory fragmentation is becoming a serious (ie. unsurmountable) problem.
It's been said over and over since at least the Java times that creating OS processes for individual service invocations is bad for performance, but I've never seen proof for this statement in the form of a benchmark.
Even the OpenBSD developers (who know a thing or two wrt. security of memory allocation schemes) diss process-per-service-invocation architectures in their httpd implementation (eg. calling their CGI bridge "slowcgi" and favouring fcgi over it).
Isn't that inconsequential? I mean if there's a performance problem with CGI-like process-per-service invocations, why not target these problems at the OS level (or via pooling of network connections or whatever the bottleneck is)?
Rust is surely fine, and an improvement over C, but its main advantage is that all the rust code is written now, when everyone takes more care about security.
It doesn't have to deal with 40 years of bad legacy code written by sloppy developers.
You can obtain similar quality in a C modern code-base, using tools like static and dynamic analyzers. In fact, today the hardest issues came from multi-threading. I won't even dare to write multi-threading apps without helgrind/TSAN.
And Rust doesn't help in this regard. From: https://doc.rust-lang.org/nomicon/races.html
'So it's perfectly "fine" for a Safe Rust program to get deadlocked or do something incredibly stupid with incorrect synchronization.'
I like go, I'd love to write libraries in it but as far as I can tell you can't really create a C compatible shared library from it.
That it still the common denominator if you want to call into it from other languages.
I'd love to write Python programs with performance critical stuff in Go
I do not contest on the opinion that Rust is a good language, but it slightly hurts me when people club C and C++ together. One can easily write correct by construction code using modern C++. Use of meta-programs allows you to create typesafe constructs. It provides you with zero cost abstractions to specify ownership of resources and ..... <I can go on> One has to just strive to not use the C baggage that comes with it.
> Under Linux, some SECCOMP initialization and capability dances having to do with dropping root and closing off privilege-escalation attacks as soon as possible after startup.
I was under the impression that these specific things were actually quite hard to do in Go. I believe that both setuid/setgid and seccomp_load change the current OS thread (only), and since Go multiplexes across multiple threads and gives programmers very little control over which ones are used for what goroutines, I'm not sure how you would, for example, apply a seccomp context across all threads in a Go program. setuid/setgid are currently unsupported for this reason, with the best method being "start a subprocess and pass it file descriptors" (https://github.com/golang/go/issues/1435).
I'd be interested to hear if others have found ways to actually do this reliably for all OS threads underlying a running Go process.
[+] [-] e3b0c|9 years ago|reply
That said, there are actually drawbacks of Rust compared with Go, IMHO. When facing a moderately large project written by others, the ergonomics for diving into the project is not as smooth as Go. There is no good full-source code indexer like cscope/GNU Global/Guru for symbol navigation across multiple dependent projects. Full text searching with grep/ack does not fill the gap well either since many symbols, with their different scopes/paths, are allowed to have the same identifier without explicitly specifying the full path. That makes troubleshooting/tracing a large, unfamiliar codebase quite daunting compared with Go.
[+] [-] xorxornop|9 years ago|reply
[+] [-] Manishearth|9 years ago|reply
You can also use https://github.com/nrc/rust-dxr to index rust code via DXR.
IIRC ctags also works with Rust.
RLS should cover this pretty well too once it happens.
[+] [-] wocram|9 years ago|reply
There are also many other tools that provide indexing, eg. ide [plugins], kythe, and the rust language server.
[+] [-] steveklabnik|9 years ago|reply
[+] [-] leshow|9 years ago|reply
[+] [-] dreta|9 years ago|reply
[+] [-] MaulingMonkey|9 years ago|reply
The CVE database. Just because you 'can' write such an array implementation doesn't mean you will, doesn't mean your third party libs will, doesn't mean any of your legacy code uses it, and certainly doesn't mean you will properly test said array implementation correctly.
The number of mitigations added to C compilers and OSes dealing mostly with C and C++ code. ASLR, W^X, /GS, -fstack-protector-all, AddressSanitizer, ... - note the lack of similar tools, or demand for them, for, say, JavaScript - despite it enjoying a similar ubiquity.
I ask this in bad faith: I encourage you to share a single nontrivial codebase which actually creates the abstraction you've described and religiously adheres to using it throughout. As to why this is in bad faith: I'm definining "nontrivial" here to mean using 3rd party APIs - which will operate on C style arrays, not your project specific safe wrappers - and thus by definition won't be "religiously" sticking to said abstractions when using said APIs. By these definitions, the codebase I'm asking for doesn't exist - by definition. Even relaxing the "third party" rule, I haven't actually worked on a nontrivial C or C++ codebase without buffer overflow problems.
Now, e.g. Rust will have the same problems when interacting with C APIs - and nontrivial programs will end up doing so eventually. However, by virtue of the language itself embracing safe-by-default, you're less likely to run into the same problems when consuming Rust APIs.
You can also use third party static analysis tools to ensure you're using a "safe C subset" (such as MIRSA C), but "nobody" does that.
[+] [-] dbaupp|9 years ago|reply
- be implemented with macros and token pasting, and result in a ton of mental overhead because you'll have a pile of types like array_foo for an array of `foo`s, and array_bar for an array of `bar`s, along with a pile of corresponding `foo * array_foo_get(array_foo, size_t)` and `bar * array_bar_get(array_bar, size_t)` functions.
- or, have a runtime cost and lose type safety by storing void* and casting when accessing.
The first case is even worse than it sounds: e.g. I don't know how you handle arrays of types with spaces in them (like `unsigned char`, or `struct bar`) with a macro. And, we haven't even thought about const correctness yet, which would probably require having const_array_foo, const_array_bar (etc.) types defined too.
(And, of course, these only solve one facet of the problems with C's pointers: there's no way to defend against use-after-free or dangling pointers.)
[+] [-] staticassertion|9 years ago|reply
Rust arrays/ vectors are safe-by-default. To use the unchecked, unsafe version requires using the 'unsafe' keyword.
let v = vec![0, 1, 2]; unsafe { let x = v.get_unchecked(5); }
This means you can basically grep audit for vulnerabilities, and the above code should be very rare.
[+] [-] naasking|9 years ago|reply
False. Buffer overflows in C can overwrite the program's memory, so it can be hijacked and supplanted with the attacker's code. This cannot happen in Rust (unless unsafe code has the vulnerability), or any memory safe language.
Sure you can implement a safe array/buffer abstraction and use it in your C programs that abort on invalid indexing. Now how many actually do this? Very few given the prevalence of C programs on vulnerability disclosure lists.
[+] [-] jdmichal|9 years ago|reply
Why allow programmers to make mistakes? That was fine in the 70's when resources for compiler execution were limited. I don't see any reason for it today.
I mean, just look at the underhanded C contestants and especially winners for ways in which your program can completely blow up for extremely subtle reasons.
[+] [-] qznc|9 years ago|reply
[+] [-] pcwalton|9 years ago|reply
1. Buffer overflows aren't considered the most insidious issue in C nowadays. That award would probably go to use after free, which is not so easy to fix.
2. In C, it is easier and faster to do the wrong thing. Compare "char buf[256]; strcpy(buf, foo); ..." to "array_t buf = array_create(strlen(foo) + 1); strcpy(buf.ptr, foo); ... array_destroy(buf);"
3. Buffer overflows do not in fact come up routinely in Rust the way they do in C.
[+] [-] josteink|9 years ago|reply
That it's not a by-default and forced language-feature and that most developer aren't going to spend those 10 minutes when they need an array.
They'll just use the language-provided array-implementation instead. Which in C is very, very unsafe.
[+] [-] strictfp|9 years ago|reply
[+] [-] Sanddancer|9 years ago|reply
[+] [-] ssalazar|9 years ago|reply
[+] [-] dibanez|9 years ago|reply
[+] [-] ChemicalWarfare|9 years ago|reply
[+] [-] JKCalhoun|9 years ago|reply
[+] [-] gsdean|9 years ago|reply
[+] [-] jstewartmobile|9 years ago|reply
Scenarios where I frequently end up fixing other people's memory errors:
1. No Error Handling: not checking an error condition on a function that allocates, then using the uninitialized pointer anyway
2. Sloppy Error Handling: jumping to abort from an error without freeing what has already been allocated
3. Faith in \0: still using the old string functions
I'm on the fence about the whole thing, so others may be able to field something more compelling.
[+] [-] ArkyBeagle|9 years ago|reply
Y'all please, please note that dreta said "... an array implementation that prevents that from ever happening..."
[+] [-] jstewartmobile|9 years ago|reply
[+] [-] awinter-py|9 years ago|reply
Really? This sounds like idiomatic rust to me (heavy with enums).
[+] [-] Ericson2314|9 years ago|reply
[+] [-] aaron-lebo|9 years ago|reply
[+] [-] jstimpfle|9 years ago|reply
(What irritated me though was the switch to first-person narrative at the end).
[+] [-] eduren|9 years ago|reply
More data points will help to inform discussion, or at the very least add structure to the flame wars.
[+] [-] tptacek|9 years ago|reply
[+] [-] giancarlostoro|9 years ago|reply
[+] [-] falcolas|9 years ago|reply
Or perhaps it's a good opportunity for a language which offers transpilation with ANSI C as the target?
[+] [-] w8rbt|9 years ago|reply
[+] [-] zzzcpan|9 years ago|reply
[+] [-] acjohnson55|9 years ago|reply
Just throwing it out there as something to keep an eye on!
[+] [-] jaco8|9 years ago|reply
[+] [-] nimmer|9 years ago|reply
[+] [-] rafinha|9 years ago|reply
[+] [-] jstewartmobile|9 years ago|reply
My experiences getting things to compile across gcc and visual c++, dealing with strings (especially Microsoft's WCHAR), reliable integer sizes (pre stdint.h), and debugging templates were not things I would wish on anyone.
Re-doing some of my side projects in Go and Rust was a lot more enjoyable. I could focus on what I was doing instead of trying to work around deficiencies in the language and its libraries.
[+] [-] jimbokun|9 years ago|reply
Doesn't sound like C++.
[+] [-] AgentME|9 years ago|reply
[+] [-] blub|9 years ago|reply
Oh well, at least they're moving from C, which will be a big win either way.
[+] [-] tannhaeuser|9 years ago|reply
It's been said over and over since at least the Java times that creating OS processes for individual service invocations is bad for performance, but I've never seen proof for this statement in the form of a benchmark.
Even the OpenBSD developers (who know a thing or two wrt. security of memory allocation schemes) diss process-per-service-invocation architectures in their httpd implementation (eg. calling their CGI bridge "slowcgi" and favouring fcgi over it).
Isn't that inconsequential? I mean if there's a performance problem with CGI-like process-per-service invocations, why not target these problems at the OS level (or via pooling of network connections or whatever the bottleneck is)?
[+] [-] amadvance|9 years ago|reply
It doesn't have to deal with 40 years of bad legacy code written by sloppy developers.
You can obtain similar quality in a C modern code-base, using tools like static and dynamic analyzers. In fact, today the hardest issues came from multi-threading. I won't even dare to write multi-threading apps without helgrind/TSAN.
And Rust doesn't help in this regard. From: https://doc.rust-lang.org/nomicon/races.html 'So it's perfectly "fine" for a Safe Rust program to get deadlocked or do something incredibly stupid with incorrect synchronization.'
[+] [-] ericfrederich|9 years ago|reply
[+] [-] asdaksdhksajd|9 years ago|reply
[+] [-] arunmu|9 years ago|reply
[+] [-] kcudrevelc|9 years ago|reply
I was under the impression that these specific things were actually quite hard to do in Go. I believe that both setuid/setgid and seccomp_load change the current OS thread (only), and since Go multiplexes across multiple threads and gives programmers very little control over which ones are used for what goroutines, I'm not sure how you would, for example, apply a seccomp context across all threads in a Go program. setuid/setgid are currently unsupported for this reason, with the best method being "start a subprocess and pass it file descriptors" (https://github.com/golang/go/issues/1435).
I'd be interested to hear if others have found ways to actually do this reliably for all OS threads underlying a running Go process.
[+] [-] eggy|9 years ago|reply