top | item 47157758

(no title)

dastbe | 4 days ago

one thing to add for passive healthchecking and clientside loadbalancing is that throughput and dilution of signal really matters.

there are obviously plenty of low/sparse call volume services where passive healthchecks would take forever to get signal, or signal is so infrequently collected its meaningless. and even with decent RPS, say 1m RPS distributed between 1000 caller replicas and 1000 callee replicas, that means that any one caller-callee pair is only seeing 1rps. Depending on your noise threshold, a centralized active healthcheck can respond much faster.

There are some ways to improve signal in the latter case using subsetting and aggregating/reporting controllers, but that all comes with added complexity.

discuss

order

No comments yet.