(no title)
rrnewton | 3 years ago
But we can tell when our `--chaos` stress tests cease to produce crashes in reasonable numbers of runs. And when we do achieve a crash we can use our analysis phase to identify the racing operations.
It's both a pro and a con of the approach that we work with real crashes/failures. This means its a less sensitive instrument than tools like TSAN (which can detect data races that never cause a failure in an actual run), but conversely we don't have to worry about false positives, because we can present evidence that a particular order of events definitely causes a failure. Also we catch a much more general category of concurrency bugs (ordering problems between arbitrary instructions/syscalls, even between processes and in multiple languages).
No comments yet.