top | item 37163980

Fuzz testing: the best thing to happen to our application tests

103 points| bluestreak | 2 years ago |questdb.io

24 comments

order

todd8|2 years ago

I was working on Distributed Services for AIX in 1986 and 1987, a distributed filesystem to compete with NFS. As this was being developed by a dev team, my colleague and I pondered how to test the system that we had architected.

There were so many possible states that a system's file system can be in. Were the conventional tests going to catch subtle bugs? Here's an example of an unusual but not unheard of issue: in a Unix system a file can be unlinked and hence deleted from all directories while remaining open by one or more processes. Such a file could still be written to and read by multiple processes at least until it was finally closed by all processes having open file descriptors at which point it would disappear from the system. Does the distributed file system model this correctly? Many other strange combinations of system calls might be applied to the file system. Will the tests exercise these.

It occurred to me that the "correct" behavior for any sequence of system calls could be defined by just running the sequence on a local file system and comparing the results with running an identical sequence of system calls against the distributed file system.

I built a system to generate random sequences of file related system calls that would run these on a local file system and a remote distributed file system that I wanted to test. As soon as a difference in outcome resulted the test would halt and save a log of the sequence of operations.

My experience with this test rig was interesting. At first discrepancies happened right away. As each bug was fixed by the dev team, we would start the stochastic testing again, and then a new bug would be found. Over time the test would run for a few minutes before failure and then a few minute longer and finally for hours and hours. It was a really ingesting and effective way to find some of the more subtle bugs in the code. I don't recall if I published this information internally or not.

jacquesm|2 years ago

That's very much my experience: asymptotic reduction in bug incidence and matching increase in runtime between subsequent errors. To the point that they are so rare that you think you have fixed all of them. But that's usually an illusion: it's just that the error rate is now so low that you no longer observe incidents yourself. The only way to get past that hurdle is to do the bad thing: release and hope that you got it right and if you didn't that the incidents will not be too bad.

You could run many tests in parallel to reduce the chance but it will never be completely zero. Writing bug free software this way is hard. The better way is to design it from the ground up with a bunch of instrumentation that keeps all of your invariants under close observation and that stops the moment anything is not according to your assumptions. This usually gets you to a high level of confidence that things really do work as designed. But of course, that also isn't perfect and residual risk (and residual bugs...) will always remain in any system of even moderate complexity. File systems are well above that level, especially distributed file systems.

rwmj|2 years ago

The key to modern fuzzing is feedback, usually some kind of coverage measurement of the program under test. This allows the fuzzer to be much smarter about how it finds new code paths and discards inputs that don't extend coverage. This makes fuzzing find bugs a lot quicker.

Google have a project to do fuzzing on Linux system calls using coverage feedback: https://github.com/google/syzkaller

PhilipRoman|2 years ago

I used this strategy for implementing a regex engine. I wanted to completely imitate the Lua pattern implementation, so I generated random patterns, ran them on random strings and compared the results.

It was very pleasant to work with such a system. Nowadays I would probably fuzz the patterns with AFL somehow.

5440|2 years ago

For those of you in FDA regulated devices, my clients started receiving FDA NSE letters for not performing fuzz testing. For example, "Though you have provided penetration testing, it does not appear that you have addressed the other items identified such as static and dynamic code analysis, malformed input (fuzz) testing, or vulnerability scanning. This testing is necessary to assess the effectiveness of the cybersecurity controls implemented and to determine whether the residual risk of your device is acceptable."

jacquesm|2 years ago

That's excellent that they are doing that. Especially for embedded devices because there tend to be lots of homebrew protocols on those, and those are usually easy pickings.

Uw7yTcf36gTc|2 years ago

If their penetration testing didn’t perform fuzzing then you may want to look into a new pen test provider. Fuzz testing is default on most pen tests (I do this professionally)

yosefk|2 years ago

The relative rarity of input (pseudo-)randomization in SW testing is near inexplicable to me, except by the very low cost of all but the most commonly reproducing bugs paid by the SW vendor.

mqus|2 years ago

In the regular testsuite (think CI) you want to have predictable results. Doing them again and again on the same code should give the same results so you can properly see with which code change things got wrong. Maybe it's simpler to explain it the other way around, for every new path your fuzzer(or other randomized test) tests, it also doesn't test a path it tested in a previous run and you probably want to add the failing paths it found to your regular test suite.

Don't get me wrong, we should have more randomization, but it's not good everywhere, which might explain why we don't have as much of it.

rwmj|2 years ago

I love fuzzing as a technique and use it quite regularly and I'm even the maintainer of AFL++ in Fedora. But running AFL++ on even a single program occupies all threads of a high end AMD server for weeks. I'm running it locally so merely paying for the electricity. If it was a cloud instance it would cost a small fortune. I think this is a reason it is not used more widely. In addition most CI systems assume the tests will run in a small finite amount of time, not run for weeks on end.

I will note that Google have a programme for doing fuzz testing on open source projects using compute from their cloud: https://google.github.io/oss-fuzz/

kragen|2 years ago

hardware people keep saying that for some reason

maybe someday software people will listen

that would be a good day

jerrinot|2 years ago

Hello, I'm Jaromir, one of the core engineers at QuestDB team. I just noticed this blog is trending! Andrei - the author - lives in Bulgaria and he is probably already sleeping. Happy to answer any question the blog left unanswered.

yosefk|2 years ago

What tests did you have before adding fuzzing?

vodou|2 years ago

(Sorry for hijacking the thread with a general fuzz question.)

I want to do fuzz testing on a library/framework written in C++. The actual target to test against is a simulator that takes input from a socket according to a network protocol. This simulator is built on both Linux and Windows.

What fuzzing frameworks would you recommend for this? There are quite a few and not always easy for me (as a fuzzing beginner) to understand the differences between them. Some of them seems to be abandoned projects as well.