I'd also strongly recommend this view of how Kubernetes uses cgroups, showing similar drill downs for how everything gets managed. Lovely view of what's really happening! https://martinheinz.dev/blog/91
I've been a bit apoplectic in the past that cgroups seemed not super helpful in Kubernetes, but this really showed me how the different Kubernetes QoS levels are driven by similar juggling of different cgroups.
Reading through the description of how cgroups are used in Kubernetes, I can see some similarities and some differences as well. It is interesting to compare the approaches.
We chose not to use cpu.weight, and instead divide the host explicitly using cgroups (slice in systemd). We put Standard VMs in dedicated slices to keep them isolated and let several Burstable VMs share a slice. This provides a trade off between the price of the VM and resource guarantees.
We use cpu.max.burst to allow the VMs to "expand" a bit, while we understand that this creates a "noisy neighbor" problem. At the same time there is a minimum guarantee of the CPU.
The cgroups allow for all those knobs and give a lot of control. Combining them in various ways is an interesting puzzle.
cpu.max.burst increases the chances of noisy neighbours stealing CPU from other tenants.
I run multi-tenant k8s clusters with hundreds of tenants and it fundamentally is a hard problem to balance workload performance with efficiency. Sharing resources increases efficiency but in most cases increases tail latencies.
I came into the Linux world via Postgres, and this was an interesting project for me learning more about Linux internals.
While cgroups v2 do offer basic support for CPU bursting, the bursts are short-lived, and credits don’t persist beyond sub-second intervals. If you’ve run into scenarios where more adaptive or sustained bursting would help, we’d love to hear about them. Knowing your use cases will help shape what we build next.
Great article, thanks! I’ve been curious if there’s any scheduling optimizations for workloads that are extremely burst-y. Such as super low traffic websites or cron job type work - where you may want your database ‘provisioned’ all the time, but realistically it won’t get anywhere near even the 50% cpu minimum at any kind of sustained rate. Presumably those could be hosted at even a fraction of the burst cost. Is that a use case Ubicloud has considered?
Thanks! That was a pleasant read. I have been wanting to mess with cgroups for a while, in order to hack together a "docker" like many have done before to understand it better. This will help!
Are there typical use cases where you reach for cgroups directly instead of using the container abstraction?
jauntywundrkind|10 months ago
I've been a bit apoplectic in the past that cgroups seemed not super helpful in Kubernetes, but this really showed me how the different Kubernetes QoS levels are driven by similar juggling of different cgroups.
I'm not sure if this makes use of cpu.max.burst or not. There's a fun article that monkeys with these cgroups directly, which is neat to see. It also links to an ask that Kubernetes get support for the new (5.14) CFS Burst system. Which is a whole nother fun rabbit hole of fair share bursting to go down! https://medium.com/@christian.cadieux/kubernetes-throttling-... https://github.com/kubernetes/kubernetes/issues/104516
msarnowicz|10 months ago
We chose not to use cpu.weight, and instead divide the host explicitly using cgroups (slice in systemd). We put Standard VMs in dedicated slices to keep them isolated and let several Burstable VMs share a slice. This provides a trade off between the price of the VM and resource guarantees.
We use cpu.max.burst to allow the VMs to "expand" a bit, while we understand that this creates a "noisy neighbor" problem. At the same time there is a minimum guarantee of the CPU. The cgroups allow for all those knobs and give a lot of control. Combining them in various ways is an interesting puzzle.
__turbobrew__|10 months ago
I run multi-tenant k8s clusters with hundreds of tenants and it fundamentally is a hard problem to balance workload performance with efficiency. Sharing resources increases efficiency but in most cases increases tail latencies.
msarnowicz|10 months ago
msarnowicz|10 months ago
I came into the Linux world via Postgres, and this was an interesting project for me learning more about Linux internals. While cgroups v2 do offer basic support for CPU bursting, the bursts are short-lived, and credits don’t persist beyond sub-second intervals. If you’ve run into scenarios where more adaptive or sustained bursting would help, we’d love to hear about them. Knowing your use cases will help shape what we build next.
motrm|10 months ago
I particularly enjoyed the gentle exposition into the world of cgroups and how they work, the levers available, and finally how Ubicloud uses them.
Looking forward to reading how you handle burst credits over longer periods, once you implement that :)
Lovely work, Maciek!
nighthawk454|10 months ago
parrit|10 months ago
Are there typical use cases where you reach for cgroups directly instead of using the container abstraction?
phrotoma|10 months ago
solarkraft|10 months ago
msarnowicz|10 months ago
iluvcommunism|10 months ago
[deleted]