top | item 29036467

(no title)

berkay | 4 years ago

What if it was easy to see why a test is flaky, compare failed/successful test runs like a code diff? Would that be useful? This is what we're building at Thundra (foresight product), instrument the tests as well as backend services to enable devs to quickly diagnose failing/flaky tests. Would appreciate any feedback you may have, here or privately.

discuss

ncmncm|4 years ago

It would be helpful to be able to present diffs of log output between successful and failing runs of a test.

This is tricky to implement, for several reasons.

Log output is normally timestamped, making every line unique. Those parts of log lines would need to be ignored when comparing between runs.

Log output ordering is often indeterminate, particularly when a test has multiple threads, or interacts with an external service. Often the order of events logged is an essential feature of the difference between a successful and failed run. But some or most order differences are just incidental. The number of logged events may vary incidentally, or significantly. Explaining all these differences in detail to the test system would be too hard. So, the system needs to discover as much as possible of this for itself, and represent these discoveries symbolically. Then, allow a test to be annotated to override default judgments about the diagnostic significance of these features.