I would recommend against running stateful apps in kubernetes. It's not really ready for it. Big problems include routing (it works fine for http requests, but not for DBs, message brokers, etc) and just the pain of setting up stateful sets.
We run stateful apps on Kubernetes. There are obvious rough areas (the lack of persistent volume resizing, for example, which is scheduled for 1.11), but overall, it's great.
What a lot of naysayers leave out, or choose to ignore, is that the challenges running stateful apps on Kubernetes mirror those of running stateful apps anywhere. If you run Postgres on a VM, for example, you're completely reliant on that VM staying up -- this is no different from Kubernetes. Some will also point out the dangers of co-locating lots of software (such as Postgres) on the same machine as many other containers, as they will compete for CPU and I/O; but this is also no different than on Kubernetes, which provides plenty of tools (affinities/anti-affinities, node selectors) to isolate containers to machines. And so on. Containers bring some new challenges, but Kubernetes meets them quite well.
What specific issues do you have? I'm not sure I understand the point about routing. I also don't understand what the "pain" of stateful sets refers to.
I've used Kubernetes a massive amount during the last two years for running stateful apps. In contrast, I do recommend it. Yes, it is challenging, since stateful app are. However, the challenges are all well worth solving in the context of Kubernetes (great benefits from health checks, automated reproducible deployments, etc.). The situation is pretty good these days in my experience; at least, a lot better than 2 years ago!
Author of post here and CRL employee: just for some additional detail, we reached out to Kelsey about the problems he's seen running databases in Kubernetes.
He said "You still need to worry about database backups and restores. You need to consider downtime during cluster upgrades."
These things are totally true. K8s doesn't automate backups (edit: by default; though, it can) and if you need to take K8s down for upgrades, then everything is down. For its part, though, CockroachDB supports rolling upgrades with no downtime on Kubernetes.
As for routing, that is tough problem if you want to run K8s across multiple regions, though we have some folks who've done it.
We run many, many stateful apps on kubernetes. Not without challenges certainly, but I am not sure any of them are really kubernetes specific.
They just don't act like other services, and require more care. That's about it. I think that's what Kelsey is referring to, you can't just treat them the same as other pods.
> Because Kubernetes itself runs on the machines that are running your databases, it will consume some resources and will slightly impact performance. In our testing, we found an approximately 5% dip in throughput on a simple key-value workload.
5% seems like a surprisingly large overhead. What is k8s doing in this situation that would have that kind of impact?
Hard to say without knowing what they ran on what. There is a non trivial amount of memory that gets eaten up by the various k8s processes, docker and networking if you are using small nodes. I have a completely empty k8s cluster up right now with 1 worker and 1 master and the worker has about ~230 mb of ram used up.
I'd like to know how to solve the storage dilution problem with stateful apps in k8s where you have to buy 3-18x more raw capacity than desired to meet availability & durability guarantees.
For example if you ran CDB on a baremetal cluster of 3 nodes with 30TB of raw capacity, 15TB is lost to RAID10, 10TB is lost to running a replicated database such as cockroach DB, leaving you with 5TB effective capacity which is a 1/6 dilution of your initial capacity.
If you ran cockroach DB on a replicated network volume, with a replication factor of three, it gets worse. If you bought 30 TB of disks, you'd lose 20 TB to volume replication, ~6.67TB to CDB replication leaving you with 3.3TB of effective capacity or a 1/9 dilution. If those disks were configured with RAID your effective capacity would drop to a 1/18 dilution.
You could achieve a 1/3 dilution which is the effective minimum for a replicated database if you didn't configure RAID, but you increase the impact of disk failure, in that it would take much much longer to recover a cluster.
It's my understanding, based on comments by googlers here on HN, that Google does run a bunch of apps on GKE. We don't know about which apps, but it's not surprising that they want to dogfood their own cloud platform.
Has anyone looked at Service Fabric (Microsoft tech) for things like this? That has offered stateful services for years now. I'm pretty sure it runs on Linux, and I've seen that it's Docker compatible. I know it's kinda in the same space as K8s but I don't really know the details. Would SF be able to do something like this in a similar (or better?) way?
It's complicated, because the definition of Service Fabric seems to be in flux.
The "original" Service Fabric is a high-level framework which requires invasive source code changes (you can't just drop an existing app on top of it), but gives you lots of benefits (scale, reliability etc) if you make the effort.
Recently container-based platforms - Docker, Kubernetes, etc - have come along with a different tradeoff: better compatibility with existing applications in exchange for less magical benefits. That approach is getting much more traction, and I think internally at Microsoft there is some infighting between the "Service Fabric camp" and the "Containers camp". One consequence of the infighting is that Service Fabric is extending its scope to include features like "container support". It's not clear to what extent that is done in collaboration with the "container people", or as a way to bypass them. I think they are still trying to decide whether to embrace Kubernetes, or replicate the functionality in-house. My prediction is that the container-based approach will win, but if will take time for the politics to fully play out. In the meantime things will continue to be confusing.
Bottom line: when evaluating Service Fabric, watch out for confusing and inconsistent use of the brand. It's a common pattern with large vendors - for example IBM with "Bluemix", SAP with "Hana", etc.
Disclaimer: I work at Microsoft, not on Service Fabric but I have built complex stateful services on top of Service Fabric.
As zapita said, Service Fabric now handles containers but I think it is just because containers became trendy and FOMO kicked in.
Where Service Fabric is decades ahead of the container orchestration solutions is as a framework to build truly stateful services, meaning the state is entirely managed by your code through SF, not externalized in a remote disk, Redis, some DB, etc...
It offers high level primitives like reliable collections [0], as well as very low level primitives like a replicated log to implement custom replication between replicas [1]. I feel that publicly this is not advertised enough and it is unfortunate because it is a key differentiator for Service Fabric that the competitors won't have for a while, if ever because it is a completely opposite approach: containers are all about isolation, being self-contained and plateform independent while SF stateful services are deeply integrated with Service Fabric.
Are there any cloud providers providing remote disks without replications?
It looks such needs are popular for deploying databases in which replications are maintained by the databases themselves.
bajsejohannes|7 years ago
If you don't believe me, take it from someone who should know what they're talking about: https://twitter.com/kelseyhightower/status/96341350830081229...
lobster_johnson|7 years ago
What a lot of naysayers leave out, or choose to ignore, is that the challenges running stateful apps on Kubernetes mirror those of running stateful apps anywhere. If you run Postgres on a VM, for example, you're completely reliant on that VM staying up -- this is no different from Kubernetes. Some will also point out the dangers of co-locating lots of software (such as Postgres) on the same machine as many other containers, as they will compete for CPU and I/O; but this is also no different than on Kubernetes, which provides plenty of tools (affinities/anti-affinities, node selectors) to isolate containers to machines. And so on. Containers bring some new challenges, but Kubernetes meets them quite well.
What specific issues do you have? I'm not sure I understand the point about routing. I also don't understand what the "pain" of stateful sets refers to.
williamstein|7 years ago
loiselleatwork|7 years ago
https://twitter.com/kelseyhightower/status/96347131657256140...
He said "You still need to worry about database backups and restores. You need to consider downtime during cluster upgrades."
These things are totally true. K8s doesn't automate backups (edit: by default; though, it can) and if you need to take K8s down for upgrades, then everything is down. For its part, though, CockroachDB supports rolling upgrades with no downtime on Kubernetes.
As for routing, that is tough problem if you want to run K8s across multiple regions, though we have some folks who've done it.
And if one finds setting up StatefulSets challenging, we have a tutorial on how to do it written by a former Kubernetes engineer: https://www.cockroachlabs.com/docs/stable/orchestrate-cockro...
AndyNemmity|7 years ago
They just don't act like other services, and require more care. That's about it. I think that's what Kelsey is referring to, you can't just treat them the same as other pods.
rrdharan|7 years ago
5% seems like a surprisingly large overhead. What is k8s doing in this situation that would have that kind of impact?
smarterclayton|7 years ago
We haven’t yet evolved Kubernetes services to prefer specific cores and avoid app workloads quite yet (although cpu management is getting closer).
Docker is also somewhat hefty memory wise and you may contend on disk if not careful.
5% seems pretty reasonable to me in general, just as a consequence of having something heavier weight on the same node managing workloads.
nimos|7 years ago
stefanatfrg|7 years ago
For example if you ran CDB on a baremetal cluster of 3 nodes with 30TB of raw capacity, 15TB is lost to RAID10, 10TB is lost to running a replicated database such as cockroach DB, leaving you with 5TB effective capacity which is a 1/6 dilution of your initial capacity.
If you ran cockroach DB on a replicated network volume, with a replication factor of three, it gets worse. If you bought 30 TB of disks, you'd lose 20 TB to volume replication, ~6.67TB to CDB replication leaving you with 3.3TB of effective capacity or a 1/9 dilution. If those disks were configured with RAID your effective capacity would drop to a 1/18 dilution.
You could achieve a 1/3 dilution which is the effective minimum for a replicated database if you didn't configure RAID, but you increase the impact of disk failure, in that it would take much much longer to recover a cluster.
lowbloodsugar|7 years ago
I understood that a team at google developed k8s but google doesn't actually run it for their "google-scale" workloads. Am I misinformed?
bajsejohannes|7 years ago
> [kubernetes is] a simplified clone of Google’s internal borg system
https://medium.com/@steve.yegge/honestly-i-cant-stand-k8s-48...
lobster_johnson|7 years ago
daxfohl|7 years ago
zapita|7 years ago
The "original" Service Fabric is a high-level framework which requires invasive source code changes (you can't just drop an existing app on top of it), but gives you lots of benefits (scale, reliability etc) if you make the effort.
Recently container-based platforms - Docker, Kubernetes, etc - have come along with a different tradeoff: better compatibility with existing applications in exchange for less magical benefits. That approach is getting much more traction, and I think internally at Microsoft there is some infighting between the "Service Fabric camp" and the "Containers camp". One consequence of the infighting is that Service Fabric is extending its scope to include features like "container support". It's not clear to what extent that is done in collaboration with the "container people", or as a way to bypass them. I think they are still trying to decide whether to embrace Kubernetes, or replicate the functionality in-house. My prediction is that the container-based approach will win, but if will take time for the politics to fully play out. In the meantime things will continue to be confusing.
Bottom line: when evaluating Service Fabric, watch out for confusing and inconsistent use of the brand. It's a common pattern with large vendors - for example IBM with "Bluemix", SAP with "Hana", etc.
jrbancel|7 years ago
As zapita said, Service Fabric now handles containers but I think it is just because containers became trendy and FOMO kicked in.
Where Service Fabric is decades ahead of the container orchestration solutions is as a framework to build truly stateful services, meaning the state is entirely managed by your code through SF, not externalized in a remote disk, Redis, some DB, etc...
It offers high level primitives like reliable collections [0], as well as very low level primitives like a replicated log to implement custom replication between replicas [1]. I feel that publicly this is not advertised enough and it is unfortunate because it is a key differentiator for Service Fabric that the competitors won't have for a while, if ever because it is a completely opposite approach: containers are all about isolation, being self-contained and plateform independent while SF stateful services are deeply integrated with Service Fabric.
[0] https://docs.microsoft.com/en-us/azure/service-fabric/servic...
[1] https://docs.microsoft.com/en-us/dotnet/api/system.fabric.fa...
tapirl|7 years ago
jen20|7 years ago