top | item 44913472

(no title)

miggy | 6 months ago

Author here. Two quick thoughts: 1. As I covered in an earlier part of this series, service discovery is not always easy at scale. High churn, partial failures, and the cost of health checks can make it tricky to get right. 2. Using server-side metrics for load balancing is a great idea. In many setups, feedback is embedded in response headers or health check responses so the LB can make more informed routing decisions. Hodor at LinkedIn is a good example of this in practice: https://www.linkedin.com/blog/engineering/data-management/ho...

discuss

ExoticPearTree|6 months ago

I was thinking something along the lines of a “map” with all the backends and their capabilities that would be recomputed every N seconds and atomically switched with the previous one. The LB woukd then be able to decide where to send a request and also have a precomputed backup option in case the first choice would become unavailable. You could also use those metrics to signal that a node needs to be drained of traffic for example, so no more new connections towards it.

I understand the complexities of having a large set of distributed services behind load balancers, I just think there could be a better way of choosing a backend based not only on least requests, TTFB and an OK response from a health check every N seconds.