top | item 33713672

(no title)

rrnewton | 3 years ago

Well, we don't prove the absence of concurrency bugs -- that would be more a job for formal verification, type systems, at the source level.

But we can tell when our `--chaos` stress tests cease to produce crashes in reasonable numbers of runs. And when we do achieve a crash we can use our analysis phase to identify the racing operations.

It's both a pro and a con of the approach that we work with real crashes/failures. This means its a less sensitive instrument than tools like TSAN (which can detect data races that never cause a failure in an actual run), but conversely we don't have to worry about false positives, because we can present evidence that a particular order of events definitely causes a failure. Also we catch a much more general category of concurrency bugs (ordering problems between arbitrary instructions/syscalls, even between processes and in multiple languages).

discuss

No comments yet.