QiKe
|
7 years ago
|
on: Horrors of Using Azure Kubernetes Service in Production
AKS now reserves 20% of memory from each agent node and a very small amount of CPU to protect docker daemon and kubelet to function with misbehaving customer pods. However, that just means customer's pods will be evicted or no place to schedule when all resource is used up. This is something we see now in customer support cases.
QiKe
|
7 years ago
|
on: Horrors of Using Azure Kubernetes Service in Production
(Eng lead for AKS here)
While lots of people have had great success with AKS, we're always concerned when someone has a bad time. In this particular case the AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet. As part of this investigation we increased the system reservation for both Docker and kubelet to ensure that in the future if a user over schedules their nodes the kernel will only terminate their applications and not the critical system daemons.