top | item 28355164

(no title)

What's the advantage of using the CFS quota mechanism over cgroups cpulimit? I've been using cgroups to successfully make heavy compile jobs unnoticeable by limiting CPU shares to "0" (essentially idle priority) and limiting RAM to 2GB (again, soft limit, uses as much as is available), plus 16GB swap.

It has worked surprisingly well, I can compile Firefox on all cores while playing a game and it's completely imperceptible that anything else is happening, both in terms of performance and interactiveness/latency. No issues with this "burstiness" or stuck/deadlocked processes, either. "Nice" is basically useless in comparison.

discuss

geofft|4 years ago

Kubernetes supports this too - CPU "requests" are implemented as a) input to the scheduler (it will not over-schedule a box) b) configuration of CPU shares in the kernel CPU cgroup. So that's effectively what we're relying on now with --cpu-cfs-quota. If you have an 8-hyperthread box and you have three jobs requesting 4, 2, and 2 CPUs each, they will use up the entire machine from the point of view of Kubernetes' scheduler and the first job will get twice as many CPU shares as the second two.

The problem is that we're running services etc., not batch jobs, so we do want them to make meaningful forward progress. So we can't set the shares to zero. We just don't want a misconfigured / runaway job to starve out the rest of the machine, even when the other jobs are not trying to use 100% CPU.

A specific sub-case here is capacity planning - right now, even with CPU shares, you can request one CPU for a computationally-intensive multi-thread/process task, and if the rest of the box is running internal web services with sporadic traffic, it will easily be able to use the whole machine. But then if you launch eight instances of that same job on the machine, they'll all perform much worse. So ideally we want to proactively limit CPU usage so that application developers/operators get realistic expectations about performance, and in turn, we get realistic information about how heavily our cluster is actually used.