top | item 30772152

(no title)

blaisio | 4 years ago

Yes! I think this is a really under-reported issue. It's basically caused by kubernetes doing things without confirming everyone responded to prior status updates. It affects every ingress controller, and it also affects services of type "Load Balancer" and there isn't a real fix. Even if you add a timeout in the pre stop hook, that still might not handle it 100% of the time. IMO it is a design flaw in Kubernetes.

discuss

order

LimaBearz|4 years ago

Not defending the situation of a preStop hook at least in the case of API's k8s can handle it 100%, its just messy.

We have a preStop hook of 62s. 60s timeouts are set in our apps, 61s is set on the ALBs (ensuring the ALB is never the cause of the hangup), and 62s on the preStop to make sure nothing has come into the container in the last 62s.

Then we set a terminationGracePeriodSeconds of 60 just to make sure it doesn't pop off too fast. This gives us 120s where nothing happens and anything in flight can get to where its going.

chippiewill|4 years ago

> Then we set a terminationGracePeriodSeconds of 60 just to make sure it doesn't pop off too fast.

I think the grace period includes the prestop duration doesn't it?

thecosmicfrog|4 years ago

Yep, same configuration here other than we use 60/65/70 for (admittedly) completely unscientific reasons.