Excluding kernel bugs, CPU limits just provide an upper bound on burst capacity. That controls oversubscription of CPU on a node. As with any other kind of oversubscription of a resource based on variable demand, there is a tradeoff. Allowing one pod to burst over its request is both unreliable and potentially impacting other neighboring pods. Whether that improves your cluster efficiency or introduces intolerably high variability in service latency and throughput depends on your mix of workloads and how the scheduler distributes your various pods.Buffer's solution of having different flavors of node, onto which mutually compatible workloads are scheduled in isolation from incompatible ones, is a very reasonable thing to do, even if this particular case is a bit of a head-scratcher.
No comments yet.