top | item 27507523

(no title)

> Classic fuzzers famously have a hard time circumventing magic values and checksums, and cryptography is full of these. This is further complicated by the fact that the double ratchet algorithm is very stateful and depends on the two ratchets evolving in lockstep.

Can the first issue be solved by scraping all the magic values from the codebase and putting them in the fuzzer's dictionary file? I wonder if this could even be automated when doing white-box fuzzing -- have the fuzzer scan the code when placing instrumentation and extract every "interesting" constant from e.g. `if` checks.

Regarding the second issue and the double ratchet specifically, are there even ideas on how it could be approached for feasible fuzzing?

discuss

TonyTrapp|4 years ago

AFL++ already does that (it scans occurrences of strcmp, memcmp, etc. for magic values). The problem here might rather be that those magic constants are not expected to show up in the bit stream being read, but rather being the result of a calculation. If the result of a calculation must be 42 for a function to continue its execution, it's not so useful to have 42 in the fuzzer dictionary. If 42 is expected to be read from the file instead, it's easier and a dictionary will help.

infogulch|4 years ago

Perhaps the solution is to extend the idea of the fuzzer dictionary to include not only byte string constants but also functions. Throw in magic values, yes, but also sha-3 and the main primitive for the ratchet algorithm, etc.

That way the fuzzer doesn't have to reinvent hashing and encryption algorithms from scratch, which I expect would be as impossible as generating the preimage of a hash or outright breaking the encryption. Maybe those should be fuzzed too, but let's not swallow the whole cow at once when we just want to fuzz one protocol.

dkasak|4 years ago

Author here. As one of the other comments mentions, afl++ (and to some extent vanilla afl) already has capability to automatically scrape magic values from arguments to special functions like `strcmp` and the like. The older technique is called libtokencap (https://github.com/AFLplusplus/AFLplusplus/blob/stable/utils...), but afl++ also has a newer feature called AUTODICT (https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...).

But this only solves the problem of magic constants expected in the input. If the check depends on dynamic properties of the input or happens deeper in the code after the input's already been through some transformations, it can't be solved like this. There are other techniques to help with this, though. One of the earlier attempts to solve such types of more complex checks is called laf-intel (https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...) and boils down to transforming a more complex check into a nested series of simpler checks. This makes it more probable that the fuzzer's random mutation will be able to solve the outer check and hence hit new coverage, enabling the fuzzer to detect the mutation as productive.

afl++ has a more modern variant of this called CmpLog (https://github.com/AFLplusplus/AFLplusplus/blob/stable/instr...) which is based on the RedQueen technique. The paper for RedQueen is a really interesting read: https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentl...

The problem of checksums is at times also solved by simply modifying the binary so that the checksum is neutralized and always succeeds, especially if you have access to source code.

As for the problem of fuzzing stateful things like the double ratchet, one way of tackling the problem is to think of the input to the fuzzer as not only the raw bytes that you'll be passing to the program you're fuzzing, but as a blueprint specifying which high-level operations you'll be performing on the input. Then you teach your fuzzer to be smarter and be able to perform a bunch of those operations.

So, let's say you take 512 bytes as the input to the fuzzer. You treat the first 256 bytes as the message to decode and the latter 256 bytes as the high-level cryptographic operations to perform on this message, each byte specifying one of those operations. So you could say a byte of value 1 represents the operation "ENCRYPT WITH KEY K1", 2 represents "ENCRYPT WITH KEY K2", 3 represents "DECRYPT WITH KEY K1", 4 represents "DECRYPT WITH KEY K2", 5 represents "PERFORM SHA2" and so on. Now you can feasibly end up with a sequence which will take a message encrypted with key K1, decrypt it, modify the message, then re-encrypt with key K2. Or, in the case of the double ratchet algorithm, have it perform multiple successive encryption steps to evolve the state of the ratchet and be able to fuzz more deeply.

Of course, the encoding needs to be rather dense for this to work well so that ideally each low-level bit mutation the fuzzer does on an input still encodes a valid sequence of valid high-level operations.

touisteur|4 years ago

Yeah protocol-based fuzzing is fun with afl(++), especially with the recent strides in grammar-based fuzzing. The afl++ is doing amazing work.