top | item 46032386

(no title)

hazz99 | 3 months ago

I’m sure this work is very impressive, but these QPS numbers don’t seem particularly high to me, at least compared to existing horizontally scalable service patterns. Why is it hard for the kube control plane to hit these numbers?

For instance, postgres can hit this sort of QPS easily, afaik. It’s not distributed, but I’m sure Vitess could do something similar. The query patterns don’t seem particularly complex either.

Not trying to be reductive - I’m sure there’s some complexity here I’m missing!

discuss

phrotoma|3 months ago

I am extremely Not A Database Person but I understand that the rationale for Kubernetes adopting etcd as its preferred data store was more about its distributed consistency features and less about query throughput. etcd is slower cause it's doing RAFT things and flushing stuff to disk.

Projects like kine allow K8s users to swap sqlite or postgres in place of etcd which (I assume, please correct me otherwise) would deliver better throughput since those backends don't need to perform consenus operations.

https://github.com/k3s-io/kine

dijit|3 months ago

You might not be a database person, but you’re spot on.

A well managed HA postgresql (active/passive) is going to run circles around etcd for kube controlplane operations.

The caveat here is increased risk of downtime, and a much higher management overhead, which is why its not the default.

travem|3 months ago

There are also distributed databases that use RAFT but can still scale while delivering distributed consensus don’t is not a challenge that can’t be solved. For example, TiDB handles millions of QPS while delivering ACID transactions, e.g. https://vivekbansal.substack.com/p/system-design-study-how-f...

Sayrus|3 months ago

GKE uses Spanner as an etcd replacement.

PunchyHamster|3 months ago

it's not really bottlenecked by the store but by the calculations performed on each pod schedule/creation.

It's basically "take global state of node load and capacity, pick where to schedule it", and I'd imagine probably not running in parallel coz that would be far harder to manage.

senorrib|3 months ago

No a k8s dev, but I feel like this is the answer. K8s isn't usually just scheduling pods round robin or at random. There's a lot of state to evaluate, and the problem of scheduling pods becomes an NP-hard problem similar to bin packing problem. I doubt the implementation tries to be optimal here, but it feels a computationally heavy problem.

__turbobrew__|3 months ago

The k8s scheduler lets you tweak how many nodes to look at when scheduling a pod (percentage of nodes to score) so you can change how big “global state” is according to the scheduler algorithm.

nonameiguess|3 months ago

It says in the blog that they require 13,000 queries per second to update lease objects, not that 13,000 is the total for all queries. I don't know why they cite that instead of total, but etcd's normal performance testing indicates it can handle at least 50,000 writes per second and 180,000 reads: https://etcd.io/docs/v3.6/op-guide/performance/. So, without them saying what the real number is, I'm going to guess their reads and writes outside of lease updates are at least much larger than those numbers.