top | item 47133336

(no title)

igor47 | 5 days ago

Back in the day, I thought about this problem domain a lot! I even wrote and open-sourced a service discovery framework called SmartStack, an early precursor to later approaches like Envoy, described here: https://medium.com/airbnb-engineering/smartstack-service-dis...

This was a client side framework, in the OPs parlance. What's missing in OP is the insight that the server-side load balancer can also fail -- what will load balance the load balancers? We performed registration based on health checks from a sidecar, and then we also did client side checks which we called connectivity checks. Multiple client instances can disagree about the state of the world because network partitions actually can result in different states of the world for different clients.

Finally, you do also still need circuit breakers. Health checks are generally pretty broad, and when a single endpoint in a service begins having high latency, you don't want to bring down the entire client service with all capacity stuck making requests to that one endpoint. This specific example is probably more relevant to the old days of thread and process pools than to modern evented/async frameworks, but the broader point still applies

discuss

singhsanjay12|5 days ago

> when a single endpoint in a service begins having high latency

Yes, have seen this first hand. Tracking the latency per endpoint in a sliding window helped in some way, but it created other problems for low qps services.