top | item 25707034

(no title)

psoundy | 5 years ago

tl;dr If you’re operating with CRDs at trivial scale, you probably having nothing to worry about. But operating with CRDs at scale is a different story and suggests careful testing with the specific applications involved.

——-

The usage patterns of native k8s types and the implications those patterns have on the scalability and reliability of etcd and the apiserver are relatively well-understood. CRDs can be a wild-card, though, and afaik testing efforts thus far have not investigated worst-case usage of CRD-based applications.

As commonly deployed, CRDs are served from the same apiservers and etcd cluster that serves the native types for a k8s cluster. That can result in contention between the CRDs supporting 3rd party additions to a cluster and the native types critical to the health of a cluster. This kind of contention has the potential to bring a cluster to its knees.

Efforts like priority and fairness seek to ensure that the apiserver can prioritize at the level of the API call. But that won’t prevent watch caches from OOM’ing the apiserver if excessive numbers of CRDs are present. The judicious use of quotas could head off the creation of an excessive number of objects, but it’s not just count that matters - the size of each resource is also a factor.

In theory, CRDs could be isolated from native types by serving them from an aggregated apiserver backed by a separate etcd cluster. afaik this not a supported configuration today, and even if it were the additional resources required to support it (especially the separate etcd cluster) may be prohibitive for many use cases.

discuss

jacques_chester|5 years ago

I agree with all of this, with one nitpick intended to self-aggrandise.

You can actually nominate particular types be stored in particular etcd servers -- GKE does this to put Events into a separate etcd from everything else.

However, it still has problems. Firstly, you can only define it for inbuilt types. Secondly, it's common for different objects to cross reference each other through objectRefs and the like, which behave badly when you effectively perform a join in the API server over multiple etcds.

bogomipz|5 years ago

>"You can actually nominate particular types be stored in particular etcd servers -- GKE does this to put Events into a separate etcd from everything else."

Interesting. Is this documented anywhere?

bogomipz|5 years ago

Thanks for the insights. I was curious about this however:

>"But operating with CRDs at scale is a different story and suggests careful testing with the specific applications involved."

Do you mean the number of different CRDs deployed here or just the number of custom resources created? Or is it the same concern with either? I'd be curious what you are defining as "scale" as well?

psoundy|5 years ago

The number of deployed CRDs is not likely to be an issue. The number and size of custom resources (CRs - instances of CRDs) is potentially an issue.

Scalability is relative, and depends on many factors including but not limited to:

- the resources available on the hosts running apiservers and etcd members

- the number and size of resources (custom and native) that controllers will maintain

Relatively speaking, a cluster of a given size might be perfectly capable of handling on the order of many thousands of resources . Push that an order of magnitude and the overhead of serving LIST calls - marshaling json from etcd to golang structs for apimachinery and back again for sending over the wire - could exhaust an apiserver’s memory allocation. And since the impact of resources is cumulative, any one application relying on lots of CRDs might not destabilize a cluster on its own but might well contribute to an unhealthy cluster when running alongside similarly CRD-heavy applications.

The key takeaway is that the kube api is best thought of as a specialized operational store rather than a general-purpose database. Anyone wanting to rely on CRDs at non-trivial scale would be well-advised to test carefully.