(no title)
paulsutter | 3 months ago
500 milliseconds is a very long interval, on a CPU timescale. Funny how we all tend to judge intervals based on human timescales
Of course the best way to choose heartbeat intervals is based on metrics like transaction failure rate or latency
hinkley|3 months ago
Automatic load balancing always gets weird, because it can end up sending more traffic to the sick server instead of less, because the results come back faster. So you have to be careful with status codes.
unknown|3 months ago
[deleted]
just_mc|3 months ago
roncesvalles|3 months ago
The relevant timescale here is not CPU time but network time. There's so much jitter in networks that if your heartbeats are on CPU scale (even, say, 100ms) and you wait for 4 missed before declaring dead, you'd just be constantly failing over.
paulsutter|3 months ago
4 x 10s heartbeats sounds like an incredibly conservative decision by whoever chose the default, and I cant imagine any critical service keeping those timeouts.
blipvert|3 months ago
nitwit005|3 months ago
7200
That is two hours in seconds.