top | item 33710774

(no title)

rrnewton | 3 years ago

(1) rr [formerly Mozilla rr]

We're big fans of rr!

Hermit is different in that creating a deterministic OS semantics is different than recording whatever nondeterministic behavior occurs under normal Linux. BUT, there's a lot of overlap. And indeed `hermit record` is straight up RnR (record & replay).

But hermit for RnR but is not nearly as developed as rr. We integrate with gdb/lldb as an (RSP) debugger backend, just like rr. Any failing execution you can create with hermit, you can attach a debugger. But our support is very preliminary, and you'll probably find rough edges. Also, we don't support backwards stepping yet (except by running again).

If we invest more in using Hermit as a debugger (rather than for finding and analyzing concurrency bugs), then there should be some advantages over traditional RnR. These would relate to the fact that deterministically executing is different than recording. For example, process and thread IDs, and memory addresses all stay the same across multiple runs of the program, even as you begin adding printfs and modifying the program to fix the bug. With traditional RnR, you can play the same recording as many times as you like, but as soon as you take a second recording all bets are off wrt what is the same or different compared to the prior recording. (That includes losing the "mental state" of things like tids & memory addresses, which is a good point Robert O Callahan makes about the benefits of RnR when accessing the same recording multiple times.)

(2) libTAS - no we haven't! Checking it out now.

(3) Yes, definitely issues with CPU portability.

In general, we are interested in not just determinism on the same machine, but portability between machines in our fleet. As with any tech company that uses the cloud, at Meta people are usually trying to debug an issue on a different machine than where the problem occurred. I.e. taking a crash from a production or CI machine to a local dev machine.

The way we do this is that we mostly report a fairly old CPU to the guest, which disables certain features IF the guest is well behaved.

With the current processor tech, I don't think there's any way we can stop an adversarial program, which, for example, would execute CPUID, find that RDRAND is not supported on the processor, but then execute RDRAND anyway. We could build a much more invasive binary-instrumentation based emulator that would be able to enforce these kinds of rules at the instruction granularity, but it would have higher overhead, especially startup overhead. The nice thing about Reverie though is that we (or others) can add different instrumentation backends while keeping the same programming instrumentation API. So we could have a "hardened" backend that was more about sandboxing and reverse-engineering adversarial software, making a different tradeoff with respect to performance overhead.

discuss

carterschonwald|3 years ago

Very cool to see the stuff you e been working on become public!