top | item 21910896

(no title)

joemag | 6 years ago

For those of us who build distributed systems, it’s all about number of modes we need to design for, test, and monitor. If all I have to worry about is process death, I can design my service around that. Monitoring for process death is generally pretty straightforward as well.

Gray failures (like process slowdowns), on the other hand, are fairly difficult to design for, and detect. And it can wreak havoc on distributed systems if one of the nodes suffers a gray failure.

discuss

No comments yet.