top | item 29504945

(no title)

kklimonda | 4 years ago

q3k has mentioned metallb+bgp, which is basically in-cluster implementation of LoadBalancer Service type (bgp speakers are running on k8s nodes and announce /32 routes to nodes based on configuration), but it doesn't provide an answer for "stabilizing" ecmp connections when there are changes to backends. There has to be something "behind" metallb[1] that will handle not only stable hashing for connections, but keep forwarding "in-flight" flows (like established tcp sessions) to correct backends, even if packets arrive on different ingress nodes. It seems cilium has some solution for that[2] (by both bundling metallb, and having maglev-based loadbalancer implementation) but I haven't had time to dig into it, so I was curious if someone else has solved it and would be willing to share stories from the front. This is one of those rough edges around kubernetes deployments in bare metal environments and I'd love to see what can be done to make it more robust.

[1] metallb only really announces IPs so that "behind" is probably just CNI that actually handles traffic [2] https://cilium.io/blog/2020/11/10/cilium-19#maglev

discuss

bogomipz|4 years ago

Ah OK, I missed that this was MetalLB specific. Interesting that Cilium using Google's Maglev which amongst other things handles the issue of ECMP churn when nodes are taken out of service. I remember reading this in the white paper when it came out. I believe Facebook's Katran does similar. Thanks for the link.