Curl is C | WingNews

[+] simias|9 years ago|reply

I have no problem with Curl being written in C (I'll take battle-tested C over experimental Rust) but this point seemed odd to me:

>C is not the primary reason for our past vulnerabilities

>There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.

So I looked at https://curl.haxx.se/docs/security.html

#61 -> uninitialized random : libcurl's (new) internal function that returns a good 32bit random value was implemented poorly and overwrote the pointer instead of writing the value into the buffer the pointer pointed to.

#60 -> printf floating point buffer overflow

#57 -> cookie injection for other servers : The issue pertains to the function that loads cookies into memory, which reads the specified file into a fixed-size buffer in a line-by-line manner using the fgets() function. If an invocation of fgets() cannot read the whole line into the destination buffer due to it being too small, it truncates the output

This one is arguably not really a failure of C itself, but I'd argue that Rust encourages a more robust error handling through its Options and Results when C tends to abuse "-1" and NULL return types that need careful checking and can't usually be enforced by the compiler.

#55 -> OOB write via unchecked multiplication

Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.

#54 -> Double free in curl_maprintf

#53 -> Double free in krb5 code

#52 -> glob parser write/read out of bound

And I'll stop here, so far 7 out of 11 vulnerabilities would probably have been avoided with a safer language. Looks like the vast majority of these issues wouldn't have been possible in safe Rust.

[+] attractivechaos|9 years ago|reply

> Rust has checked multiplication enabled by default in debug builds, and regardless of that the OOB wouldn't be possible.

That bug is only triggered by an unrealistic corner case (username longer than 512MB). Run-time check in debug build won't help unless you realize the possibility before hand and put a unit test for that. I am more interested in the second part of your comment: "the OOB wouldn't be possible". How does rust protect against such integer overflow caused by multiplication? Thanks in advance.

[+] hzhou321|9 years ago|reply

> The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code.

All bugs are logical errors at various scale.

[+] zzzcpan|9 years ago|reply

I think you are being too generous. All of those 11 vulnerabilities were caused by the language, its lack of memory safety, limited expressiveness, poor abstractions it encourages, etc.

[+] astrobe_|9 years ago|reply

> Looks like the vast majority of these issues wouldn't have been possible in safe Rust.

Is it not where you're supposed to point to re-implementations or equivalents written in Rust?

[+] eddieroger|9 years ago|reply

> Of course that leaves a share of problems that could’ve been avoided if we used another language. Buffer overflows, double frees and out of boundary reads etc, but the bulk of our security problems has not happened due to curl being written in C.

He addressed all of those points in the second short paragraph. None of those are C vulnerabilities, they were mistakes made on the part of the developers, not the language. Avoidance of problems in a safer language doesn't mean when things happen, it's the language's fault.

[+] ameliaquining|9 years ago|reply

I'm kind of torn on this.

On the one hand, Curl is a great piece of software with a better security record than most, the engineering choices it's made thus far have served it just fine, and its developers quite reasonably view rewriting it as risky and unnecessary.

On the other hand, the state of internet security is really terrible, and the only way it'll ever get fixed is if we somehow get to the point where writing networking code in a non-memory-safe language is considered professional malpractice. Because it should be; reliably not introducing memory corruption bugs without a compiler checking your work is a higher standard than programmers can realistically be held to, and in networking code such bugs often have immediate and dramatic security consequences. We need to somehow create a culture where serious programmers don't try to do this, the same way serious programmers don't write in BASIC or use tarball backups as version control. That so much existing high-profile networking software is written in C makes this a lot harder, because everyone thinks "well all those projects do it so it must be okay".

[+] marcosdumay|9 years ago|reply

At extreme cases, where you want to prove your network stack, you'd have to write it in C (or something equivalently dangerous), because you can't prove big runtimes.

But in mid-way applications, dangerous languages are dangerous, and should be avoided.

I'd think the most "correct" way to handle this is to create small bits of network code with proof of correctness that handles down parsed and tagged (typed) data into code in high level languages. In that case, Curl type of code would still be in C.

[+] baldfat|9 years ago|reply

> the only way it'll ever get fixed is if we somehow get to the point where writing networking code in a non-memory-safe language is considered professional malpractice.

So using Linux, Windows, BSD or MacOS servers are malpractice? I think you might have over stated your case. So are you waiting for a memory safe Herd re-write? A memory safe any OS will be decades away if someone wanted to start tackling it now.

[+] Corrado|9 years ago|reply

I get the feeling that very few old school authors would come forward and say "Hey, my project code is not good and needs to be re-written." Most of the time the message is more like "Sure we're using C but we're very careful and have built in a lot of safeguards."

I like your idea of a default mindset of internet tools should be written in a safe language with proper engineering techniques. I'm not sure I would go as far as malpractice but it might be a good stick to use to force makers into better practices.

[+] greenhouse_gas|9 years ago|reply

Small history of security:

First major security breech through buffer -overflow was in the late 80s.

So when Java came out, they played it "safe" - the language, despite having pointers will be absolutely safe. NullPointerExceptions and ArrayOutOfBoundsException will cause the program to crash rather than corrupting the stack.

Perfect.

Except it wasn't. It ended up being so "holy" that it's now banned in browsers.

So, everyone said to move to JS. Another "perfectly safe" language.

But it's too slow.

So JIT it.

Now it's no longer "perfectly safe".

Rinse and repeat.

And rust won't help here, because while the " compiler " can be guaranteed safe, the code it outputs can't (think of a C compiler written in Rust).

Maybe the solution isn't to rely on language (except for the Kernel) but to make of easy to spawn OS processes that simply have no rights to call any syscalls and limited amount of memory (or a white-listed amount of syscalls).

Take it like this:

Firefox (the browser) has full rights. It starts a process (which can only connect to the network to IP RemoteHost).

If process dies (for whatever reason) or takes too long, tell user that "sorry, sites broken".

Now, malicious code causes the attacker to run arbitrary code? Who cares? You can't overwrite the browser's code and can't break out.

The browser just has to ensure that its subprocess gives you good output.

Same with JS, CSS, or image libraries.

[+] rwmj|9 years ago|reply

While this doesn't so much apply to libcurl (but see below), there is a third alternative to "write everything in C" or "write everything in <some other safer language>". That is: use a safer language to generate C code.

End users, even those compiling from source, will still only need a C compiler. Only developers need to install the safer language (even Curl developers must install valgrind to run the full tests).

Where can you use generated code?

- For non-C language bindings (this could apply to the Curl project, but libcurl is a bit unusual in that it doesn't include other bindings, they are supplied by third parties).

- To describe the API and generate header files, function prototypes, and wrappers.

- To enforce type checking on API parameters (eg. all the CURL_EASY_... options could be described in the generator and then that can be turned into some kind of type checking code).

- Any other time you want a single source of truth in your codebase.

We use a generator (written in OCaml, generating mostly C) successfully in two projects: https://github.com/libguestfs/libguestfs/tree/master/generat... https://github.com/libguestfs/hivex/tree/master/generator

[+] KuiN|9 years ago|reply

> generate C code.

Programmatically generating C code not without problems. How can you prove that the C you're generating is free from problems solved by the safer language? Cloudbleed came from computer generated C code: https://blog.cloudflare.com/incident-report-on-memory-leak-c....

[+] mushiake|9 years ago|reply

FFTW[0] is also written like that (generator written in OCaml emitting C).

[0]http://www.fftw.org/

[+] chii|9 years ago|reply

> generate C code.

how is that different from just writing it in another language? End users who need to compile will be able to regardless of the generated C code, but the end users who need to do a _little_ modification will be given ugly generated C code! Seems stictly worse to me...

[+] ndesaulniers|9 years ago|reply

I wonder how cloudflare feels about that? Ragel

[+] tannhaeuser|9 years ago|reply

Not only is curl based on C, but so are operating systems, IP stacks and network software, drivers, databases, Unix userland tools, web servers, mail servers, parts of web browsers and other network clients, language runtimes and libs of higher-level languages, compilers and almost all other infrastructure software we use daily.

I know there's a sentiment here on HN against C (as evidenced by bitter comments whenever a new project dares to choose C) but I wish there'd be a more constructive approach, acknowledging the issue isn't so much new software but the large collection of existing (mostly F/OSS) software not going to be rewritten in eg. Rust or some (lets face it) esoteric/niche FP language. Even for new projects, the choice of programming language isn't clear at all if you value integration and maintainability aspects.

[+] wyldfire|9 years ago|reply

> I know there's a sentiment here on HN against C

I think there's two major against-C groups: those of us who have worked with C for decades and those who never worked with it. I'll try and speak for those of us who've used it for decades. The popular high-level languages that have arrived since ~1995 (Java, Python, JS, C# and friends) are excellent productivity increases. In general, they sacrifice memory and performance in favor of robustness and security. For enormous software problem domains, we just don't need C's complexity or error-proneness.

Until Rust, there's been very close to zero serious competitors for C if I wanted to write a bootloader, OS, or ISR. Not even C++ could do those (without being extremely creative on how it's built/used). The ~post-2000 languages (golang, swift, D etc) can't do that (perhaps D's an exception but it wasn't an initial goal AFAICT). This is huge, IMO.

We've groaned and grumbled about how hard it is to parse C/C++ code for decades. This is a big deal for tooling. Because of the language's design, even if you use something "simple" like libclang to parse your code, you still have to reproduce the entire build context just to sanely make an AST. All of those other new languages above probably address this problem but also add all kinds of other stuff which we can't have for specialized problem domains (realtime/low-latency requirements, OSs, etc).

> collection of ... software not going to be rewritten in eg. Rust or some (lets face it) esoteric/niche FP language

IMO it's not appropriate to lump Rust in with "nice FP language"s. And don't look now but lots of stuff is being rewritten in Rust. Fundamental this-is-the-OS-at-its-root stuff: coreutils [1], "libc" [2], kernels [3], browser engines [4].

[1] https://github.com/uutils/coreutils

[2] https://github.com/japaric/steed

[3] https://github.com/redox-os

[4] https://github.com/servo/servo

[+] fiedzia|9 years ago|reply

> the large collection of existing (mostly F/OSS) software not going to be rewritten in eg. Rust or some

It is happening and will keep happening and it is really necessary at some point. Sure, I don't expect large project to be rewritten overnight, but every large project is being redone eventually. Especially when C becomes main source of problems. And you can introduce better languages gradually.

> operating systems, drivers

https://github.com/redox-os/redox

> IP stacks

https://github.com/QuiltOS/QuiltNet

Many of those are often written in userspace, where you are free to use any language. Several of them will provide sufficient performance.

> databases, Unix userland tools, web servers, mail servers, parts of web browsers and other network clients, language runtimes and libs of higher-level languages, compilers and almost all other infrastructure software we use daily.

For all of those you'll find Rust implementations. Some are work in progress, some are already widely used.

[+] pjmlp|9 years ago|reply

Historical accident, I started to work back in the days when UNIX was the only OS implemented in C.

Everything else outside UNIX was using Assembly, Algol, PL/I, Modula or Pascal dialect.

C owes its success to UNIX's adoption by the market, as operating system available almost free of charge to universities, with source code available.

[+] erelde|9 years ago|reply

I am student and I like C, I've tested Rust/Go, I like the feeling of C. Maybe that sentiment will change later, but for now, I like C. It's simple and sharp and there's lots of doc/books.

[+] vbezhenar|9 years ago|reply

A lot of mentioned software was started many years ago, when other languages were hardly viable options. And those programs are good enough now, so there's no much movement to replace them. It's not a good argument for C, IMO. It's like telling that Windows is awesome because so many users use it. But when people started from scratch (mobile world), it turned out, that Windows is not the best OS.

Actually when I'm reading about new software, it's very rare to encounter C. Usually it's something else.

[+] unwind|9 years ago|reply

Well put.

Didn't know that curl was stuck back on C89, that's really optimizing for portability.

If anyone is confused by the "curl sits in the boat" section header, that's basically a Swedish idiom being translated straight to English. That rarely works, of course, and I'm sure Daniel knows this. :)

The closest English analog would be "curl doesn't rock the boat", I think the two expressions are equivalent (if you sit, you don't rock the boat).

[+] paulddraper|9 years ago|reply

I didn't know "sit in the boat was a thing", but I liked it.

"Sit in boat" is a positive expression of the stability benefits.

[+] cat199|9 years ago|reply

he mentions rocking the boat later in the paragraph..

[+] devy|9 years ago|reply

The 7th point: "curl sits in the boat"

    In the curl project we’re deliberately conservative and 
    we stick to old standards, to remain a viable and reliable 
    library for everyone. Right now and for the foreseeable 
    future. Things that worked in curl 15 years ago still work 
    like that today. The same way. Users can rely on curl. We 
    stick around. We don’t knee-jerk react to modern trends. 
    We sit still in the boat. We don’t rock it.

I see a lot of inertia in there. While it's a great record to maintain 15-year consistency but in the era of every changing InfoSec outlook, it could be a legacy and baggage if the authors resist to change. One thing we know for sure is that human will make mistakes, no matter how skillful you are. In the context of writing a fundamental piece of software with an unsafe programming language, that means we are guarantee to have memory-safety induced CVE bugs in curl in the future.

Some of other points that the author raised are valid too. If there is a trade-off that we can have a safer piece of fundamental software by almost eliminating a whole category of memory safety related bugs, and with the downside of less compatibility with legacy systems, more dependencies etc., perhaps we should consider it? I believe the tradeoff is well worthy in the long run and option is ripe for explore.

[+] cestith|9 years ago|reply

How is the author resistant to change? He specifically said new code should be written in a language that meets the priorities for that code. He specifically said someone has or would write a competitor to curl in Rust or some other safer language and that a good one will take off. He welcomed that.

What he doesn't welcome is rewriting something that's had those bugs and the types of logic bugs not related to the language already worked out. There's a saying about a baby and bathwater.

Not everything is a dichotomy, and you shouldn't be reading the article as if the author is against newer languages. He specifically says that given a fresh start with the availability of these languages he might use something besides C. Carefully weighing options is wise. Throwing away years of actual progress for the appearance of quick progress is foolish.

[+] paulddraper|9 years ago|reply

> it could be a legacy and baggage if the authors resist to change

Could be. Or it could not be.

Somehow, Git has increased in popularity, despite its author's over-my-dead-body insistence on C.

[+] generic_user|9 years ago|reply

> it could be a legacy and baggage if the authors resist to change.

Every few years there is a new batch of programming languages that come out and they all gain a small passionate community that tries to convince the internet how much better that language is.

They inevitably use the argument that code not written in the new language is 'legacy' and 'resistant to change'.

Neither of those assertions are accurate or enlightening unless you can provide a proposed replacement and prove the superiority of the new code.

Simply telling other programmers to rewrite there code in xyz language with such arguments is primarily a case of armchair development.

If you really think it could be done better then do it and prove it.

[+] feld|9 years ago|reply

I've a wild idea: add support for pledge, capsicum, etc to your codebase so these vulns are neutralized on every OS with a good security framework

[+] throwaway5752|9 years ago|reply

It's extremely simple. If you think Curl would be better in another language then port it, release your alternative, and maintain it for a long time.

Even if your language (Rust, Erlang, LISP, Go) is "better", it's still a minimal part of the equation. A maintainer is what makes the tool. It's hard work to decide which PRs to accept (and worse yet, reject), to backport fixes to platforms for which you can't get a reliable contributor, coordinating fundraising/donations, keeping up with evolving standards...

Anyway. Thank you, thank you, thank you Daniel Stenberg. Use whatever damn language you want.

[+] kazinator|9 years ago|reply

> Use whatever damn language you want.

On the other hand, if he didn't want his justifications for that choice examined by the world, he wouldn't have aired them, right?

> If you think Curl would be better in another language then port it, release your alternative, and maintain it for a long time.

That's out there; some languages have URL downloading objects that are not based on Curl.

E.g. Edi Weitz's Drakma client library for Common Lisp doesn't seem to be using Curl as far as I can see.

http://weitz.de/drakma/

[+] derefr|9 years ago|reply

> A library in another language will add that language (and compiler, and debugger and whatever dependencies a libcurl written in that language would need) as a new dependency to a large amount of projects that are themselves written in C or C++ today. Those projects would in many cases downright ignore and reject projects written in “an alternative language”.

Why would I be vendoring my own copy of libcurl in my project? Who does? This is how I (or rather, the FFI bindings my language's runtime uses) consume libcurl:

    dlopen("libcurl.so")

I rely on a binary libcurl package. The binary shared-object file in that package needed a toolchain to build it, but I don't need said toolchain to consume it. That would still be true even if the toolchain required for compiling was C++ or Rust or Go or whatever instead of C, because either the languages themselves, or the projects, ensure that the shared-object files they ship export a C-compatible ABI.

An example of a project that works the way I'm talking about: LLVM. LLVM is written in C++, but exports C symbols, and therefore "looks like" C to any FFI logic that cares about such things. LLVM is a rather heavyweight thing to compile, but I can use it just fine in my own code without even having a C++ compiler on my machine.

(And an example of a project that doesn't work this way: QT. QT has no C-compatible ABI, so even though it's nominally extremely portable, many projects can't or won't link QT. QT fits the author's argument a lot better than an alternate-language libcurl would.)

[+] Sir_Cmpwn|9 years ago|reply

Agreed 100%. Definitely going to be trotting this article out next time I see someone blindly arguing for rewriting xyz in Rust.

I particularly like the mention of portability. No other language comes even remotely close to the portability of C. What other language runs on Linux, NT, BSD, Minix, Mach, VAX, Solaris, plan9, Hurd, eight dozen other platforms, freestanding kernels, and nearly every architecture ever made?

[+] cwyers|9 years ago|reply

I mean, sure, and if you have users running VAX or the Hurd, that matters. But it turns out that most of us use one of Linux, NT or OS X. And even if you add BSD and Solaris (and a few other Unixes) you can still find languages without C's known problems that cover 100% of users. "But embedded." Embedded can maintain their own software, they do all the time. How long are we going to insist that end users run software that cannot be secure because of the lowest common denominator of programming languages?

[+] NoGravitas|9 years ago|reply

I think the strongest argument is "rewriting would introduce lots of new bugs that we don't have now". It's a lot easier to justify staying the course with C on a project that has its troubled youth long behind it, than it is to justify starting a new project in C now.

[+] kazinator|9 years ago|reply

> The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.

This statement is laughable nonsense. Shall we go into their bug history and point out counterexamples left and right? [Edit:user simias has done this; thanks!]

Every single bug you ever make interacts with the language somehow.

Even if you think some bug is nothing but pure, that logic is part of a program, embedded in the program's design, whose organization is driven by language.

[+] coldtea|9 years ago|reply

>There. The simple fact is that most of our past vulnerabilities happened because of logical mistakes in the code. Logical mistakes that aren’t really language bound and they would not be fixed simply by changing language.

That's wrong. A lot of the C mistakes are indeed "logical mistakes in the code", but most of them would be indeed fixed by changing to a language that prevents those mistakes in the first place.

[+] chousuke|9 years ago|reply

In my view, the problem with C in general is that it's a loaded gun with no safety or trigger guard. It's trivial to shoot yourself (or someone else) in the foot, and it requires knowledge, meticulous care and lots of forethought to avoid getting shot.

I very much agree that rewriting existing, stable software written in C is likely not worth the trouble in many cases, but I can't accept claims that the limitations of C aren't the direct cause of tens of thousands of security vulnerabilities, either.

In Rust, even a less experienced developer can fearlessly perform changes in complicated code because the language helps make sure your code is correct in ways that C does not. And you can always turn off the safeties when you need to.

Experienced developers should feel all the more empowered by simply not having to always worry about things like accidental concurrent access, use-after-free, object ownership, null pointers or the myriad other trivial ways to cause your program to fail that are impossible in safe Rust. You get to worry about the non-trivial failure modes instead, which is much more productive.

[+] jeffdavis|9 years ago|reply

"C is not a new dependency"

To just use a library, rust isn't much of a dependency, either. It's designed so you don't even need to know that it's not C.

Rust would obviously be a build dependency, but that's lessened somewhat because it tries to make cross-compilation easy.

(But this point does apply to pretty much any other language. Curl would not be used as widely if it depended on the Go runtime, for instance.)

[+] tombert|9 years ago|reply

While I'm definitely not suggesting we replace Curl with a rewrite in Rust (since the current Curl has had decades of good testing and auditing done on it), I am actually very curious how a rewrite in a safer language like Rust, OCaml, Haskell, or Go would fair in comparison in regards to performance and whatnot.

If I were ambitious enough, I'd do it myself in Haskell, but I think it'd be too much work for a simpler curiosity.

[+] empath75|9 years ago|reply

This seems like a no-brainer for a re-implementation in rust, but I wouldn't expect that someone would rewrite curl itself in rust, but a new library that does the same things.

[+] floatboth|9 years ago|reply

> a new library that does the same things

Most languages already have HTTP client libraries. (In particular, Rust has Hyper. Ruby/Python/Node/Go have HTTP clients built-in in the stdlib, Haskell has http-client, etc.) Who uses libcurl really? (Spoiler alert… PHP.)

Of course libcurl does FTP and Gopher and all the things, but these aren't commonly required, most applications just need HTTPS.

[+] pionar|9 years ago|reply

but why? What does one gain over just using libcurl?

[+] unknown|9 years ago|reply

[deleted]

[+] geodel|9 years ago|reply

I think Rust community increasingly behave like this[1]. They are big on suggesting others the better 'ideas' instead of implementing themselves. So they keep using 'curl' and 'openssl' but tell others to rewrite their software with Rust.

1. http://dilbert.com/strip/1994-12-17

[+] coding123|9 years ago|reply

I don't see why this is an issue, whoever is arguing for a change can write rurl and be done, and see if anyone takes it up in their distributions.

[+] skocznymroczny|9 years ago|reply

I don't think C is a bad language, although I think it could use lists and dictionaries in standard library. std::vector and std::map are the only things that make me pick C++ in an instant, given the choice.

[+] adynatos|9 years ago|reply

While C by itself is not safe, I would argue that no sane development environment uses C by itself. Over the decades of its production use dozens of tools have been developed that make it far safer: *grind suite, coverage tools, sanitizers, static analyzers, code formatters and so on. Those tools are external, otherwise they would make C slower. Something for something.

[+] tete|9 years ago|reply

I think it's a bit weird that C and curl are used. If we look at C and OpenBSD or so things might look a bit different.

Also one has a hard time comparing curl with another language, simply because something with curl's properties (take portability for example) doesn't exist.

And no that isn't in defense of anything, just me thinking thinking that measurable points brought up in the discussions don't make sense or exist.

The topic is also a bit broader, as you can easily add in static code analysis, compiler flags, stuff like W^C, stuff like seccomp, capsicum, cloudabi, pledge which might not work (well) in other cases.

It's a great philosophical discussion topics and I don't wanna stop anyone, just hoping people keep that in mind, when they participate, so we don't end up with new dogmas that get thrown around for the next few year, without knowing contexts or meaning of phrases.

Other than that: I really enjoy this discussion. :)

370 comments