There's no replacement for intelligence. Turning the world into authoritarian dystopia in search of that replacement seems to be the popular thing to do, unfortunately.
tl;dr: because ossl_a2ulabel had no unit tests until a few days ago, the fuzzer could not have reached it through any combination of other tests.
That fuzzing is tricky was not the problem here. The problem is the culture that allowed ossl_a2ulabel to exist without unit tests. And before some weird nerd jumps in to say that openssl is so old we can't apply modern standards of project health, please note that the vulnerable function was committed from scratch in August 2020. Without unit tests.
I'm not familiar with C enough to know the answer, but I'm trying to think how anything goes from untrusted input -> trusted input safely.
To sanitize the data, you're putting the input into memory to perform logic on it, isn't that itself then an attack vector? I would think that any language would need to do this.
There are a lot of different issues that can come up, but in practice ~80% of those (my made up number) are out-of-bounds issues. So for example, say you're parsing a JSON string literal. What happens if the close-quote is missing from the end of the string? You might have a loop that iterates forward looking for the close-quote until it reaches the end of the input. What that code should do is then return an error like "unclosed string". If you write that check, your code will be fine in any language. What if you forget that check? In most languages you'll get an exception like "tried to read element X+1 in an array of length X". That's not a great error message, but it's invalid JSON anyway, so maybe we don't care super much. However in C, array accesses aren't bounds-checked, so your loop plows forward into random memory, and you get a CVE roughly like this one.
In short, the issue is that you forgot a check, and your code effectively "trusted" that the input would close all its strings. If you never make mistakes like that, you can validate input in C just like in any other language. But the consequences of making that mistake in C are really nasty.
Just because something is in memory doesn’t mean that it is realistically executable. That’s why you can download a virus to look at the code without it installing itself.
You aren’t wrong that even downloading untrusted data is less secure than not downloading it. But to actually exploit a machine that is actively sanitizing unsafe data, you need either (A) an attack vector for executing code at an arbitrary location in memory, or (B) a known OOB bug in the code that you can exploit to read your malicious data, by ensuring your data is right after the data affected by the OOB bug.
>To sanitize the data, you're putting the input into memory to perform logic on it
Sure, but memory isn't normally executed.
One of the more common problems was not checking length. Many C functions assume sanitized data and so they don't check. You have functions to get that data that don't check length - thus if someone supplies more data than you have more room for (gets is most famous, but there are others) the rest of the data will just keep going off the end - and it turns out in many cases you and predict where that off the end is, and then craft that data to be something the computer will run.
One common variation: C assumes that many strings end with a null character. There are a number of ways to get a string to not end with that null, and if the user can force that those functions will read/write past the end of data which is sometimes something you can exploit.
So long as your C code carefully checks the length of everything you are fine. One common variation of this is checking length but miss counting by one character. It is very hard to get this right every single time, and mess it up just once and you are open to something unknown in the future.
(Note, there are also memory issues with malloc that I didn't cover, but that is something else C makes hard to get right).
Adding more and better fuzzing instead of trying to fix the issue (potentially malicious user input inside a C library) seems like the wrong way to address the problem. Buffer overruns just shouldn’t be a concern of the developer or test suite but of the compiler or language runtime.
There are two problems. The CVE, and the fact that the current fuzzing harness does did not find it. The CVE is getting fixed, but obviously the fuzzer needs work too because it exists to find these kinds of issues before they get used in the wild.
It's being handled how it should be. This happened, let's handle it, and how can we work to better address future problems.
Trusting the fuzzer and not examining its coverage seem to be main problem here.
I fail to see what is problematic about giving the control over the entire flow of the program to the developer. Quite the contrary, I am more concerned about the paradigm shift towards higher level system programming languages that hide more and more control from the developer while putting more burden on the perfectness of optimizer.
But modern fuzzers aren't open-loop, they are coverage directed, adjusting their inputs to increase coverage. As the article points out, this works best if leaf functions are fuzzed; difficult to reach corners still might not be found.
The problem here was that the coverage of the fuzz testing was not being examined.
Using parsers for untrusted input in C is a legacy of when this was written. Requiring the parsing portion (or any version of OpenSSL) to be rewritten in Rust or whatever new language is a massive change given the length of time the OpenSSL project has been around.
You can easily write a robust parser in C. Just don't write a clump of code that interleaves pointer manipulation for scanning the input, writing the output and doing the parsing per se.
* Have a stream-like abstraction for getting or peeking at the next symbol (and pushing back, if necessary). Make it impervious to abuse; under no circumstances will it access memory beyond the end of a string or whatever.
* Have some safe primitives for producing whatever output the parser produces.
* Work only with the primitives, and check all the cases of their return values.
Because there isn't a good way of distributing pre-compiled cross-platform C libraries. So if you want to use a parsing library written in Rust, for example, you'd need to add Rust to your toolchain, which is a pain.
One solution to this problem would be to write an LLVM backend that outputs C. Maybe such a thing already exists.
Loads of bugs aren't detected by fuzz testing, as this technique exhibits stochastic behaviour, where you'll most likely find bugs overall, but have varying chances (including none at all) of uncovering specific bugs.
Which is great news for those of us who approach such research by gaining a deep understanding of the code and the systems it exists in, and figuring out vulnerabilities from that perspective. An overreliance on fuzzing keeps us employed.
Fuzz testing has a very high chance of detecting bugs, especially these kind, but you do need to at least check that the fuzzer is reaching the relevant code!
Fuzz tests can take a seed corpus of test vectors. If the test framework tries them first, it can guarantee that it will find those bugs in any test run. For anything beyond that, it depends on chance.
> I think we should give the developers the benefit of doubt and assume they were acting in good faith and try to see what could be improved.
I feel like there is this trend of assuming any harsh criticism is bad faith. Asking why industry standard $SECURITY_CONTROL didn't work immediately after an issue happened that should have been caught by $SECURITY_CONTROL is hardly a bad faith question.
Questions themselves are not good-faith or bad-faith. People asking questions are doing so in either good-faith or bad-faith.
Someone pushing hard on legitimate criticisms with the intent of attacking a project or members thereof is acting in bad-faith, while someone ignorant with a totally bogus criticism could be acting in good-faith. Many bad-faith actors hide behind a veneer of legitimacy by disguising or shifting the gaze away from their motivations.
> The actual solution is that open source, widely used code is a target for hackers.
The long term solution is likely using languages which are A. Memory safe, and B. make formal verification viable. Being widely used and open source isn't an issue if there are no exploitable bugs in the code.
There are a limited number of people willing to spend a limited amount of time fuzzing, reviewing, and scrutinizing crypto libraries. The more libraries exist, the more their efforts are divided, and the total scrutiny each library receives decreases. How would this help the problem?
I find it quite brave to trust the OpenBSD guys by default. Historically, they have way too many forked huge projects (Apache, gcc, patched clang, ...) to understand them in depth.
OpenSMPTD had its fair share of exploits. sudo had its fair share of exploits.
[+] [-] docandrew|3 years ago|reply
[+] [-] fulafel|3 years ago|reply
(I guess that semantics can also be seen as a formally verified property)
[+] [-] userbinator|3 years ago|reply
[+] [-] er4hn|3 years ago|reply
[+] [-] germandiago|3 years ago|reply
I think interfaces in Botan, to give an example, are way easier to use.
It looks to me like a minefield the OpenSSL API.
[+] [-] jeffbee|3 years ago|reply
That fuzzing is tricky was not the problem here. The problem is the culture that allowed ossl_a2ulabel to exist without unit tests. And before some weird nerd jumps in to say that openssl is so old we can't apply modern standards of project health, please note that the vulnerable function was committed from scratch in August 2020. Without unit tests.
[+] [-] adql|3 years ago|reply
[+] [-] MuffinFlavored|3 years ago|reply
it's not realistic to enforce unit test coverage % with a project at the scale of OpenSSL, right?
[+] [-] sramsay|3 years ago|reply
[+] [-] rsaxvc|3 years ago|reply
[+] [-] planede|3 years ago|reply
[+] [-] dllthomas|3 years ago|reply
[+] [-] w_for_wumbo|3 years ago|reply
Is anyone able to explain this to me?
[+] [-] oconnor663|3 years ago|reply
In short, the issue is that you forgot a check, and your code effectively "trusted" that the input would close all its strings. If you never make mistakes like that, you can validate input in C just like in any other language. But the consequences of making that mistake in C are really nasty.
[+] [-] zwkrt|3 years ago|reply
You aren’t wrong that even downloading untrusted data is less secure than not downloading it. But to actually exploit a machine that is actively sanitizing unsafe data, you need either (A) an attack vector for executing code at an arbitrary location in memory, or (B) a known OOB bug in the code that you can exploit to read your malicious data, by ensuring your data is right after the data affected by the OOB bug.
[+] [-] bluGill|3 years ago|reply
Sure, but memory isn't normally executed.
One of the more common problems was not checking length. Many C functions assume sanitized data and so they don't check. You have functions to get that data that don't check length - thus if someone supplies more data than you have more room for (gets is most famous, but there are others) the rest of the data will just keep going off the end - and it turns out in many cases you and predict where that off the end is, and then craft that data to be something the computer will run.
One common variation: C assumes that many strings end with a null character. There are a number of ways to get a string to not end with that null, and if the user can force that those functions will read/write past the end of data which is sometimes something you can exploit.
So long as your C code carefully checks the length of everything you are fine. One common variation of this is checking length but miss counting by one character. It is very hard to get this right every single time, and mess it up just once and you are open to something unknown in the future.
(Note, there are also memory issues with malloc that I didn't cover, but that is something else C makes hard to get right).
[+] [-] kramerger|3 years ago|reply
[+] [-] stcredzero|3 years ago|reply
[+] [-] alkonaut|3 years ago|reply
[+] [-] DistractionRect|3 years ago|reply
It's being handled how it should be. This happened, let's handle it, and how can we work to better address future problems.
[+] [-] artariel|3 years ago|reply
I fail to see what is problematic about giving the control over the entire flow of the program to the developer. Quite the contrary, I am more concerned about the paradigm shift towards higher level system programming languages that hide more and more control from the developer while putting more burden on the perfectness of optimizer.
[+] [-] sitkack|3 years ago|reply
Why are people still using parsers for untrusted input in C? That is the real flaw here, not how the fuzzing was done.
[+] [-] not2b|3 years ago|reply
[+] [-] er4hn|3 years ago|reply
Using parsers for untrusted input in C is a legacy of when this was written. Requiring the parsing portion (or any version of OpenSSL) to be rewritten in Rust or whatever new language is a massive change given the length of time the OpenSSL project has been around.
[+] [-] kazinator|3 years ago|reply
* Have a stream-like abstraction for getting or peeking at the next symbol (and pushing back, if necessary). Make it impervious to abuse; under no circumstances will it access memory beyond the end of a string or whatever.
* Have some safe primitives for producing whatever output the parser produces.
* Work only with the primitives, and check all the cases of their return values.
[+] [-] halpmeh|3 years ago|reply
One solution to this problem would be to write an LLVM backend that outputs C. Maybe such a thing already exists.
[+] [-] ralphb|3 years ago|reply
[+] [-] mkeedlinger|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] yumjum|3 years ago|reply
Which is great news for those of us who approach such research by gaining a deep understanding of the code and the systems it exists in, and figuring out vulnerabilities from that perspective. An overreliance on fuzzing keeps us employed.
[+] [-] spockz|3 years ago|reply
[+] [-] Diggsey|3 years ago|reply
[+] [-] skybrian|3 years ago|reply
[+] [-] draw_down|3 years ago|reply
[deleted]
[+] [-] bawolff|3 years ago|reply
I feel like there is this trend of assuming any harsh criticism is bad faith. Asking why industry standard $SECURITY_CONTROL didn't work immediately after an issue happened that should have been caught by $SECURITY_CONTROL is hardly a bad faith question.
[+] [-] aidenn0|3 years ago|reply
Someone pushing hard on legitimate criticisms with the intent of attacking a project or members thereof is acting in bad-faith, while someone ignorant with a totally bogus criticism could be acting in good-faith. Many bad-faith actors hide behind a veneer of legitimacy by disguising or shifting the gaze away from their motivations.
[+] [-] KingLancelot|3 years ago|reply
All this bickering over language misses the real problem.
The actual solution is that open source, widely used code is a target for hackers.
By using one library used everywhere for everything, you’re painting a target on your own back.
The real solution is we need the software ecosystem to have more competition and decentralization.
Use alternative crypto libraries.
If you want a drop in replacement, use LibreSSL which was forked and cleaned up by the OpenBSD guys due to HeartBleed.
But the long term solution, is more competition by using smaller, more specialized libraries, or even writing your own.
[+] [-] nicoburns|3 years ago|reply
The long term solution is likely using languages which are A. Memory safe, and B. make formal verification viable. Being widely used and open source isn't an issue if there are no exploitable bugs in the code.
[+] [-] hn_acc_2|3 years ago|reply
[+] [-] klqwr|3 years ago|reply
OpenSMPTD had its fair share of exploits. sudo had its fair share of exploits.