top | item 24355059

(no title)

rsanders | 5 years ago

A container with a request but without a limit should be scheduled as Burstable, and it should only receive allocations in excess of its request when all other containers have had their demand <= request satisfied.

A container without either request or limit is twice-damned, and will be scheduled as BestEffort. The entire cgroup slice for all BestEffort pods is given a cpu.shares of 2 milliCPUs, and if the kernel scheduler is functioning well, no pod in there is going to disrupt the anything but other BestEffort pods with any amount of processor demand. Throw in a 64 thread busyloop and no Burstable or Guaranteed pods should notice much.

Of course that's the ideal. There is an observable difference between a process that relinquishes its scheduler slice and one that must be pre-empted. But I wouldn't call that a major disruption. Each pod will still be given its full requested share of CPU.

If that's not the case, I'd love to know!

discuss

Thaxll|5 years ago

Are you sure that BestEffort QOS do not disrupt the entire node? I remember in the past a single pod would freeze the entire VM.

rsanders|5 years ago

I wrote a little fork+spinloop program w/100 subprocesses and deployed it with a low (100m) CPU request and no limit. It's certainly driving CPU usage to nearly all 8 of the 8 cores on the machine, but the other processes sharing the node are doing fine.

Prometheus scrapes of the kubelet have slowed down a bit, but are still under 400ms.

Prometheus scrape latency for the node kubelet has increased, but not it's still sub-500ms.

Note that this cluster (which is on EKS) does have system reserved resources.

    [root@ip-10-1-100-143 /]# cat /sys/fs/cgroup/cpu/system.slice/cpu.shares
    1024
    [root@ip-10-1-100-143 /]# cat /sys/fs/cgroup/cpu/kubepods/cpu.shares
    8099
    [root@ip-10-1-100-143 /]# cat /sys/fs/cgroup/cpu/user.slice/cpu.shares
    1024