top | item 10438273

Swarm vs. Fleet vs. Kubernetes vs. Mesos

205 points| amouat | 10 years ago |radar.oreilly.com | reply

83 comments

order
[+] krenoten|10 years ago|reply
This is a pretty balanced article. Absent is Nomad, which claims the world but when you dig into the code pretty large advertised chunks are simply not there yet. Nomad seems like a much more straightforward implementation of the Borg paper, and one day may be interesting once they write the rest of it. A nice Kubernetes feature that is similar to what you can do with fleet is the “Daemon Set” which lets you run certain things on every node. Some cool Mesos features that are pretty new and haven’t been talked about much yet:

* persistent volumes: let frameworks have directories on agent machines returned to them after an availability event happens, which is nice for replicated stateful services

* maintenance primitives: schedule machines to go offline at certain times, and ask frameworks if it’s safe to take nodes offline. This will soon start being used for stateful services so that they can vote on when it’s safe to take out a replica, and to trigger proactive rereplication when maintenance is desired.

* oversubscription: if you have an agent that has given away all of its resources, but the agent detects that there is still some unutilized CPU, it can start “revokable tasks” to fill up the slack up until it starts interfering with existing workloads.

[+] jacques_chester|10 years ago|reply
Cloud Foundry has a scheduler called Diego[1], which is now in public beta on PWS. Because it's built for Cloud Foundry, you get the whole PaaS as well. No need to roll your own service injection, staging, logging, authentication and so on: it's already built and integrated.

For me, the cleverest part about Diego is that it turns placement into a distributed problem through an auctions mechanism. Other attempts at this focus on algorithmic approaches that assume perfect information. Diego instead asks cells to bid on incoming tasks and processes. The auction closes after a fixed time and the task or process is sent to the best bidder. This greatly reduces the need to have perfect central consistency in order to perform bin-packing optimisation -- in a real environment, that turns out to matter a lot.

Cloud Foundry is large and very featuresome, so for those who want a more approachable way to play with Diego, try Lattice[2].

Disclaimer: I work in Pivotal Labs, a division of Pivotal. Pivotal is the major donor of engineering effort to Cloud Foundry. I worked on the Buildpacks team and I've deployed commercial apps to PWS, so I'm obviously a one-eyed fan.

[1] https://github.com/cloudfoundry-incubator/diego-design-notes

[2] http://lattice.cf/

[+] unknown|10 years ago|reply

[deleted]

[+] amouat|10 years ago|reply
Could you compare Diego to Mesos? On the face of it, this sounds like a similar system.
[+] jacques_chester|10 years ago|reply
Since it's too late to edit, turns out I'm out of date about the current way Diego works. There's an auction mechanism but it's been made more centralised due to a thundering herd problem. It more closely resembles Mesos and others, insofar as it collects resource reports and selects candidates based on standard bin-packing techniques; cells can then reject placements.
[+] jtarchie|10 years ago|reply
Would you say your knowledge of Cloud Foundry and Diego is directly proportional to how fancy you dress on Fridays? Because today is Friday and I don't see fancy dress.
[+] muraiki|10 years ago|reply
How well does Cloud Foundry play with stateful containers?
[+] meddlepal|10 years ago|reply
Lattice looks pretty cool. Thanks for the reading material.
[+] vidarh|10 years ago|reply
One more thing worth mentioning with fleet is that you can schedule anything systemd can handle. This means you can have it schedule timers, for example, and if the machine your timer runs on dies, fleet will re-schedule the timer on another machine.

For fleet a timer is just another systemd unit - there's no special support for them -, so you get a simple "cluster wide cron" pretty much for free.

Fleet works fairly well, though it does have some minor maturity issues (I've gotten it into weird states when machines have left the cluster abruptly and rejoined where e.g. it refuses to schedule some unit afterwards; solution: change the name of the unit) . No idea ho well it'll scale in larger deployments.

[+] kevinsimper|10 years ago|reply
I have tried Swarm intensively and it is not ready at all, sadly! There is no support for pulling from private registries, there is no rescheduling if one of the nodes goes down and there is not intelligent way to manage volumes.

You are actually better of managing each server individually than using swarm.

(hard earned truth)

[+] amouat|10 years ago|reply
Yes, I should have mentioned in the article that Swarm is still heavily under development.

I'm still wondering if and how they're going to address the idea of co-scheduling groups of containers, like pods in Kubernetes. This is a common need, but Docker don't seem very keen: https://github.com/docker/docker/issues/8781

[+] darren0|10 years ago|reply
I just wanted to throw into the mix Rancher. Rancher is a more recent comer into the space and still in beta. Rancher focuses on simplicity, flexibility, and pragmatic solutions for Docker based infrastructure. Rancher is different in that it can be deployed in conjunction with all of these systems mentioned or can be ran as a replacement. It includes a scheduler, load balancing, health checks, service discovery, application templating based on compose syntax, storage management, application catalog, upgrade management, github/ldap integration, UI, API, CLI, etc.

Disclaimer: Co-founder of Rancher Labs and chief geeky guy behind Rancher

[+] z3ugma|10 years ago|reply
Rancher looks really good. I'm going to try it out - can you guys add a launch guide for Rancher OS on DigitalOcean?

Also, I want to make a plugin/service that's dependent on what you've build solely for the pun "Huevos RancherOS"

[+] unethical_ban|10 years ago|reply
HN: Has anyone here worked significantly with SmartOS and its associated tools? I love the idea of a Zones/ZFS-backed container OS, but their documentation looks very sloppy. Does anyone here have extensive experience with it (who's not Bryan Cantrill - your BSDNow podcast interview is what got me looking)
[+] lucd|10 years ago|reply
Yes Joyent's Triton may be the best way to run docker containers, not only for ZFS, but for Illumos Zones, network virtualization.. You may be interested by this article from Casey Bisson, about running Mesos on Triton:

https://www.joyent.com/blog/mesos-by-the-pound

[+] amouat|10 years ago|reply
I haven't, but note you can use ZFS with Docker today.
[+] handimon|10 years ago|reply
Nomad is another scheduler released by hashicorp. This probably wasn't available when the article was written but I am curious how it would compare to the others.

https://www.nomadproject.io

[+] dberg|10 years ago|reply
Thanks for posting, Nomad looks really impressive. have you run it in production ?
[+] tacotuesday|10 years ago|reply
I'm leaning toward Kubernetes because that's what fabric8.io has chosen. The others may be great, but I'm not really interested in writing a lot of glue code to make them work well with Jenkins/Gerrit/Nexus/Slack/Docker/Chaos Monkey/OpenShift the way Fabric8 has.
[+] KirinDave|10 years ago|reply
Only Kubernetes has facilities for hosting external services reliably. I'm not sure why these other lower level tools are being compared to it. Especially Mesos and Marathon, which are much lower level.
[+] amouat|10 years ago|reply
They are being compared as they are the main options for clustering and scheduling containers. I'd agree Kubernetes is at a higher level to the other options, or at least comes with more features.

I'm not sure what you mean by "hosting external services reliably" - what's external and who is unreliable?

[+] manojlds|10 years ago|reply
Marathon and Kubernetes are in the same level, why do you call it lower level. Mesos, yes, and Kubernetes can run on Mesos too.
[+] idlewords|10 years ago|reply
What is the threshold number of computers past which using this stuff is worth the tradeoff in complexity?
[+] rconti|10 years ago|reply
This is something we face in our environment. The OSes are pretty homogenous, but the application set is very diverse; 16 of these, 2 of these, 8 of those, and so on. It's made previous orchestration tools a bit more unwieldy, but of course manual control is unwieldy as well!

In addition, of course, the task of learning, implementing, and evaluating the options take a large amount of time on top of the time we already spend (mostly) manually maintaining infrastructure.

Articles like this are a great stepping stone.

[+] cwmma|10 years ago|reply
I deploy a pretty small app with like 2 instances per deploy and 3 deploys (beta and 2 production versions for different clients) and Kubernetes is 100% worth it for the rolling updates and the ability to build the image once and push it different places.
[+] NateDad|10 years ago|reply
Would have liked to see Juju in this comparison. Maybe juju is too flexible, since it's not restricted to just deploying docker containers?

(disclosure, I'm a Juju dev)

[+] tacotuesday|10 years ago|reply
Is there an easy way to try Juju on a laptop? Like a vagrant vm or something? Thank you :)
[+] coleca|10 years ago|reply
Any thoughts on comparisons of these vs Amazon ECS? I know it's not open and portable, but still interested in understanding the differences.
[+] pkinney|10 years ago|reply
We attempted to use ECS for a while before ultimately switching to Kubernetes. While its tight, built-in integration with AWS's Elastic Load Balancers and Auto-Scaling Groups made it fit well in the AWS ecosystem, we found that there wasn't enough visibility into the system. Containers would be stopped without notification or logging and not restarted.

We've found the Kubernetes primitives to be the easiest and most straight-forward to work with while still providing a very powerful API around which to wrap all sorts of custom tooling.

[+] lifty|10 years ago|reply
I am looking forward to the day when Mesos(or any other PaaS software) will be able to use Ceph to save application(app/container/vm) state. In such a setup applications could move around the compute cluster without loosing state.
[+] jfindley|10 years ago|reply
Mesos does now have early-stage support for persistent disks. See https://github.com/apache/mesos/blob/master/docs/persistent-...

There's also technologies like flocker that will allow you do this with docker volumes (which can then be run inside mesos).

Both of these are in a pretty early stage of life, so there's some rough edges, but if you really need it, it's there.

[+] fh973|10 years ago|reply
Containers and cluster storage bring you really close to what Google's infrastructure looks like. Especially when the systems are fully fault-tolerant like Mesos and Ceph. One of the drawbacks of Ceph is that its filesystem is not ready for production yet, and you have to resort to block devices that can only accessed from one host at at time.

At Quobyte (http://www.quobyte.com, disclaimer: I am one of the founders), we have a built a fully fault-tolerant distributed file system. This allows concurrent scalable shared access to file systems from any number of hosts. Think of a /data that is accessible from any host and can be mapped in any container.

What I found pretty neat was that we could easily do a mysql HA setup on Mesos: put mysql in a container, use a directory on /quobyte for its data, and enable Quobyte's mandatory file locking. When you kill the container, or switch off/unplug its host, the container gets rescheduled and recovers from the shared file system.

[+] ownagefool|10 years ago|reply
We're already doing this with kubernetes + fleet.

Ceph itself seems really stable, though we did have an issue with kubernetes not aquiring a lock on ceph and thus having 2 nodes write in the event that one went unresponsive. This was patched in a later version, but make sure it's merged into the release you go with.

[+] weavie|10 years ago|reply
So the article seems to suggest that for smaller clusters use Kubernetes, for larger ones use Mesos.

Would people agree?

[+] amouat|10 years ago|reply
Author here. I would say it's a bit more subtle than that.

If you have a very large cluster (1000s of machines), Mesos may well be the best fit, as you are likely to want Mesos's support for diverse workloads and the extra assurance of the comparative maturity of the project.

The big difference with Kubernetes is that it enforces a certain application style; you need to understand the various concepts (pods, labels, replication controllers) and build your applications to work with those in mind. I haven't seen any figures, but I would expect Kubernetes to scale well for the majority of projects.

[+] KirinDave|10 years ago|reply
If you're facing endpoints at the outside world and would like to do so easily and with pretty good reliability, Kubernetes on Mesos is probably the way to think about it.

Or should, when the K8S-Mesos stuff is a bit more mature.

[+] Florin_Andrei|10 years ago|reply
Anyone using Dkron? http://dkron.io/

What I really need is a "distributed cron", first and foremost, with the additional requirements of: it being lightweight on resources, and it being multiplatform. Dkron seems to fit the description pretty well, I'm looking for any feedback from users.

[+] isoos|10 years ago|reply
Is any of these platforms useful for a small company that wants to have a basic self-hosted environment for the usual stuff?

They are looking for services like e-mail (SMTP, IMAP, maybe webmail), website (static + maybe wordpress), source code hosting (Subversion) and reviews (?), CI (Jenkins?), devops, and CRM...

[+] cpitman|10 years ago|reply
Kubernetes can be used to deploy anything that you can put into a Docker container, including support for persistent volumes (ie "mode 1" applications). I've been using it recently to host XMPP servers, gitlab, jenkins, etc.

I haven't installed Kubernetes directly, but I have setup and used Openshift v3 which adds a PaaS solution on top of Kubernetes. Setting it up is really easy, and they've release an all-in-one VM to demo it out: http://www.openshift.org/vm/

The other option is to use a hosted solution. Google Container Engine (https://cloud.google.com/container-engine/) is essentially hosted Kubernetes.

PS I work for Red Hat. OpenShift V3 is our product, and we contribute a lot to Kubernetes.

[+] __jal|10 years ago|reply
Hard to say without more detail, but I would lean towards probably not.

For the environment conjured in my head by what you're describing, I would probably use something like Salt or Puppet to drive installation, config management/monitoring and upgrades, possibly on top of oVirt or similar.

I don't see an advantage for tiny shops containerizing everything at this point, at least until some bright shiny future where containerization is much more of the norm and platforms like this are much more mature and easy to manage. Your sysadmin almost certainly has better things to do than adding wrappers and indirection to a tiny environment.

[+] jacques_chester|10 years ago|reply
I'd use a public PaaS. Heroku pioneered it. I like Pivotal Web Services because I work for the company which runs it, the same opensource platform (Cloud Foundry) powers BlueMix as well. There's also OpenShift from Red Hat.

Rolling your own is very time-consuming if your goal is to deliver simple apps.

[+] sciurus|10 years ago|reply
Anything to say about Flynn?

https://flynn.io/

[+] Titanous|10 years ago|reply
Flynn is a high-level platform that doesn't require you to think about low-level components like schedulers, overlay networking, service discovery, consensus, etc. Flynn does of course include a built-in scheduler and other components that accomplish many of the goals mentioned in the article.

(disclosure: I am a cofounder of Flynn)

[+] amouat|10 years ago|reply
The article focused on systems for running straight Docker containers, so I didn't investigate Flynn.
[+] geggam|10 years ago|reply
How many people actually need this level of scale vs how many people are implementing this because its $BUZZWORD
[+] ownagefool|10 years ago|reply
It's not about scale, it's about manageability.

Having a set of uniform nodes where you schedule containers is nicer, in my opinion, than managing an infrastructure where you've scripted applications to go places.

Sure, you could make puppet manage containers on uniform nodes but then we're having a massive convention vs configuration argument, which we shouldn't bother with because the distributed schedules are giving us a lot of important stuff for mostly free.