top | item 7578427

A New Development for Coverity and Heartbleed

125 points| neuroo | 12 years ago |blog.regehr.org | reply

46 comments

order
[+] jdp23|12 years ago|reply
Interesting approach! Kudos to Coverity for jumping on it so quickly.

Taint analysis is notoriously prone to false positives; as well as the reasons listed in this post, there are many situations where relations between variables mean that tainted data doesn’t cause problems. [For example, the size of the memcpy target (bp) is known to be greater than payload; so even though payload is tainted, there isn't a risk of a write overrun.] But even noisy warnings can be very useful — when we first implemented simple taint analysis in PREfix a decade ago, the first run was 99% false positives but one of the real bugs we found was in a system-level crypto module. So with the increased attention to these kinds of bugs after Heartbleed, seems like a great time for more attention to these classes of bugs.

[+] hackinthebochs|12 years ago|reply
>first run was 99% false positives but one of the real bugs we found was in a system-level crypto module.

Thinking of it as a false positive seems like the wrong perspective. The static analyzer is a tool that flags usages that are not proven to be correct. The fact that it turned out to be valid isn't the issue, the issue is that your code did not prove it valid to the satisfaction of the analyzer. This isn't necessarily a failing of the analyzer, but an indication that your code should be written in a different way, or provided more "evidence" that its correct (i.e. if guards/size checks).

The goal should be to write code in such a way that whatever tool you're using can prove it correct. Sure, the better the tool the easier this process is. But we really need to fundamentally rethink how we approach this problem.

[+] skybrian|12 years ago|reply
I think it works even better if you can get help from the type system. For example, the SafeHtml interface in GWT [1] gives you some safety from Java's type checking and can also make additional static analysis easier. (Then it becomes an exercise in making sure the API is used as intended.)

Perhaps something similar could be done using typedefs in C?

[1] http://www.gwtproject.org/javadoc/latest/com/google/gwt/safe...

[+] pjungwir|12 years ago|reply
> additional locations in OpenSSL are also flagged by this analysis, but it isn’t my place to share those here.

Why do I get the feeling that we're going to see three months of new OpenSSL vulnerabilities, like we saw with Rails last year? I'm sure Heartbleed plus all the bad press about code quality means a lot of people are suddenly looking. Assuming there is more to find, does anyone have any advice for how we might prepare for it?

[+] vladd|12 years ago|reply
In case you want to try out static analysis on your own code-base, here's a link with the list of most popular tools in this field, grouped by language: http://en.wikipedia.org/wiki/List_of_tools_for_static_code_a...
[+] Joeri|12 years ago|reply
I configured jshint inspection in a pre-commit hook on my team's code repository. Yesterday I checked the logs and in the last 10 days it prevented 15 commits, a few of which were actual bugs. For me static analysis is a no-brainer. The only challenge is finding the right set of settings that you don't get too many false positives.
[+] kyberias|12 years ago|reply
Interesting: "As you might guess, additional locations in OpenSSL are also flagged by this analysis, but it isn’t my place to share those here."
[+] nodata|12 years ago|reply
Anyone know if these have been bug reported?
[+] lbarrow|12 years ago|reply
It's super cool to see the power of advanced static analysis these days. Props to the Coverity team for using the Heartbleed trainwreck to motivate new research on these problems.

That said, are there other ways to fix this class of problem? We have choices. We can continue to build ever-more-advanced tools for patching over the problems of C and C++, or we can start using languages that simply do not have those problems.

There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.

[+] pjmlp|12 years ago|reply
> There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.

Except that safer systems programming languages are older than C with bounds checking by default, having compilers that allowed to disable them if really really required[1]:

Algol (1960)

PL/I (1964)

Modula-2 (1978)

Mesa (1979)

Even VAX, B6500 and 68000 assembly have support for doing bounds checking.

[1] Not the first version of Algol though, as according to Hoare Turing Award speech, customers didn't want unsafe features:

<quote> A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. </quote>

[+] pcwalton|12 years ago|reply
Well, usually this stuff is written in C because other languages come with big runtimes that make them unsuitable for utility libraries that need to be callable by everyone and from everywhere.

That said, we're of course working on changing that with Rust. But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice. It's not surprising that it's taken a long time to get there.

[+] joveian|12 years ago|reply
You can fix almost any issue in C and use almost any programming paradigm. Some people consider this a bug but I consider it a feature. There is no reason a SSL implementation written in C should be vulnerable to buffer overflow. Since c99's anonymous data allocation it is even possible to have a string plus length struct and a single macro to convert C strings to that structure that can be used in function arguments. It is true that C at best doesn't help correct behavior and sometimes actively encourages incorrect behavior and I think this is a serious issue in the language. However, other languages take a "let me do things right for you" approach that works well as long as your definition of right sufficiently matches the language. But as paradigms come and go it is easy to hit corner cases in such languages while C still works or can be adapted fairly easily, which makes it a good choice for core functionality if not everything.

IMO, C fails at being sufficiently low level to do this as well as it could. I don't think C will ever be replaced by a higher level language; it will be replaced with a lower level language that is better at incorporating static analysis of particular usage patterns into the language. To put it another way: C makes you do a bunch of "extra" work compared to other languages without really helping you do that work; other popular languages I know of try to not make you do that "extra" work, which is often a good thing but not always. A true replacement for C will need to still make you do all the "extra" work but help you make sure that work is correct.

E.g.memory allocation: it is not that manual memory allocation and deallocation is necessarily unreliable, but that there is no single way to make it reliable that works well for all programs. But there is also no way to do automatic memory managment that works well for all programs.

I would never recommend C++, but my sense is that current popularity of C++ might be connected to templates which are flexible and powerful in a different way than C or any other language I know of.

(also C was designed for "minicomputers" not mainframes, so not really all that different from its modern usage)

[+] apaprocki|12 years ago|reply
Is this implemented in Coverity by a local model? (There is reference to a model being applied) Or was the actual product modified to support this? Can Coverity customers get ahold of this now?
[+] neuroo|12 years ago|reply
The "model" makes reference of the model injection for memcpy.

The modification made by the team is referenced in John's blog post "Their insight is that we might want to consider byte-swap operations to be sources of tainted data".

As Andy said (and quoted), that's a modification that we need to evaluate overall to look at its impact in term of false positives (FP). It will probably be made available however under some options if it doesn't pass our acceptance tests for FP rate though... a bit too early to say.