top | item 9704475

Docker, Mesos, Marathon, and the End of Pets

64 points| ddispaltro | 10 years ago |blog.factual.com | reply

35 comments

order
[+] KaiserPro|10 years ago|reply
I think the main problem with all of these systems is that are just so damn complex.

Docker, its great if you have no state. But then if you have no state shit is easy. Mapping images to highspeed storage securely and reliably is genuinely hard. (unless you use NFSv4 and kerberos)

Mesos is just over kill for everything. How many people actually need shared image programs bigger than a 64 core machine with 512gigs of ram? (and now good are you at juggling NUMA or NUMA like interfaces)

I can't help thinking that what people would really like is just a nice easy to use distributed CPU scheduler. Fleet basically, just without the theology that comes with it.

Seriously, mainframes look super sexy right now. Easy resource management, highly scalable real UNIX. (no need to spin up/down, just spawn a new process)

[+] justizin|10 years ago|reply
> Mesos (sic) is just over kill for everything.

That's definitely an opinion. :)

I have seen Mesosphere deployed with great success.

Insofar as state, this is one reason I'm not crazy about CoreOS - I feel more comfortable containerizing the application tier than the data tier, though both are certainly possible.

I'm really not eager to replace a highly tuned MySQL or Postgres machine with a container environment experiencing several levels of abstraction and redirection. I get frustrated enough trying to align partitions with block boundaries through RAID controllers.

But if you have 20 front-end app servers and 5 machines that run cron jobs, container services can help you to utilize your capacity much better. I can't say how many times I've worked somewhere that we desperately needed capacity, but didn't have the budget to expand until we cleaned up a bunch of machines that were vastly underutilized.

Anyway, Mesosphere isn't perfect, I have only even used it moderately, but there's a lot of tooling out there which we can use.

Def agree on the wierd theology of fleet, but also generally that it just doesn't do enough for me. It's way too much fucking trouble to say, "Run an http proxy on each physical machine".

[+] jacques_chester|10 years ago|reply
You might like Lattice[0], which is extracted from Cloud Foundry.

Basically, everyone is racing back to PaaSes. Heroku pioneered it and are still out there. Red Hat have OpenShift and are making noises about turning it into a Docker+Kubernetes thing in version 3. Cloud Foundry has been around for a few years now. There are other also-rans.

The thing is that apart from Heroku, you've not heard of installable PaaSes because they're being pitched to the Fortune 500s.

I've worked on Cloud Foundry and I work for the company which donates the most effort to the Foundation. It's been surreal to watch other people introduce pieces of a PaaS and see the excitement about the pieces. Meanwhile, we literally have an entire turnkey system already. If you need a full PaaS -- push your app or service and have it running in seconds, with health management, centralised logging, auto-placement, service injection, the works -- we built it already. Free and opensource, owned by an independent Cloud Foundry Foundation.

Anyhow, I'm obviously biased, YMMV etc etc. But I'd play with Lattice, to get the hang of things.

[0] http://lattice.cf/

[+] zwischenzug|10 years ago|reply
There are some surprising places where you can enforce state where it seems impossible. Once you have that, the benefits coalesce.

That's why I built ShutIt, which we've used to encapsulate complex legacy environments to produce stateless builds:

http://ianmiell.github.io/shutit/

For example, teams can have a development environment (with _everything_ in it) rebuilt daily. As everyone uses it, everyone curates it, and they're all talking about the same thing - one pet if you like, rather than n, where n is the number of developer/development envs.

[+] pibefision|10 years ago|reply
Docker is great also to hide complexity during implementation. Discourse.org is doing a great work "enveloping" their complex rails app in containers to easy the install process. And is not stateless.
[+] steveb|10 years ago|reply
The real problem is going from tutorial to something you would use in production. Throw in logging, security and service discovery and you can have a few engineers hacking away for months.

So I want to plug a project I've been contributing to: https://github.com/CiscoCloud/microservices-infrastructure

We're trying to make it super easy to deploy these tools. For example every time you launch a docker container, it will register with consul and be added to haproxy. The nice thing about using Mesos is we can support data like workloads Cassandra, HDFS, and Kafka on the same cluster your run Docker images on.

We use terraform to deploy to multiple clouds so you don't get locked in to something like cloudformation.

[+] bkeroack|10 years ago|reply
This is basically why Kubernetes exists: for all the plumbing, discovery, etc required on top of bare containers.

It still requires work to go from zero to production-quality stack, of course.

[+] Wilya|10 years ago|reply
Is anyone running Marathon in production? Real production. The kind where any downtime means lost money.

I see a lot of intro-level tutorials, but almost nothing on the more advanced side.

My (completely casual) experience with Marathon is pretty bad, with the main process crashing quite regularly even under no load, so I'm wondering if people who write about these systems have actually used them for non-trivial tasks. And for something as critical as Marathon, which is supposed to handle... well... all my services, I'd rather be sure that the system is rock solid.

(This is specifically about Marathon. Mesos itself has proven more reliable)

[+] steve0ps|10 years ago|reply
I've been running Marathon in production (real production) to power more than 100 applications for the past six months. I chose it because it seemed like the most stable thing at the time; however, quickly found it was not production ready. While many of the original issues I encountered in 0.7.x were resolved with the 0.8.x release, 0.8.x brought new issues such as stuck deployments, etc. Additionally, I have found the upgrade path to be obtrusive and frankly scary. I am actively moving away from Marathon because of these issues.

Marathon does not make using Docker or building microservices simple. There are many important pieces that Marathon does not provide. Sure your operations team can tie in Mesos-DNS / Bamboo / Consul / whatever else, but it's going to take time, requires a specialized team, and leaves you feeling nervous about what happens if everything crashes in the middle of the night. Even when tying in these third party tools, it is likely you will have to make significant code updates to utilize features such as service-discovery / SRV records. You will inevitably end up with a hobbled-together system that needs serious support from your operations team.

I am fairly frustrated as a whole with Mesosphere, and expected more from a company who raised so much capital.

[+] tjkells|10 years ago|reply
Running a complete hosted telephony service using nearly the exact stack defined in this article - https://developers.corvisa.com/

It has worked remarkably well and allowed us to scale up/down during peak hours or unexpected high traffic peaks.

[+] brndnmtthws|10 years ago|reply
Yes, many people have in fact run it in "real production". Go ahead and Google my name for credibility.

There were indeed issues with the 0.7.x series of Marathon, but we've made a big effort to focus on stability and performance in 0.8.x, and onward. As with any new software project, there tend to be issues in early releases.

[+] jacques_chester|10 years ago|reply
Factual have done what lots of people do, which is invent the first 20% of a PaaS.

PaaSes are awesome. They also, once you go past the basics, require enormous engineering effort. And that's the problem: engineering effort spent on curating your own homegrown PaaS is engineering effort not available for creating user value.

5 years ago rolling your own was a source of competitive advantage. Today you can get an installable PaaS (Cloud Foundry or OpenShift) off the shelf and run it. In 2 years Docker, Mesos and CoreOS will probably all have PaaSes of their own.

Interesting times.

[+] samkone|10 years ago|reply
I know two companies running running openshift on Mesos. The reason modern Paas aren't enough, is the factat some scale you're not just running web services. But also rather complex data pipelines involving distributed data systems like kafka, spark, cassandra, etc .. and a simple paas has issues handling those workloads. That's where Mesos shines. As for the user value, I consider having an uptime system, providing reliable and intelligent service based on data processing, by efficiently using your resources is has an important business value.
[+] copsarebastards|10 years ago|reply
To be honest, I still have yet to see one of these systems that beats simply using Bash. They're all trying to make a scripting problem into a configuration problem. That's sometimes a reasonable idea, when what you're doing is common enough that only a few things here and there need to diverge from the defaults. But every image I've ever had to create contained far more edge cases than default cases, and half of those edge cases are things nobody thought of and therefore their system doesn't handle it. Rather than trying to fight with one of these systems to get them to do something I could easily do with a few Bash commands, I find it easier to just script the setup in Bash to begin with.

Obviously that doesn't work for Windows systems.

[+] tomjen3|10 years ago|reply
In that case you might want to look at ansible: it does exactly what such a system should (logs in with SSH and runs some scripts) but does in a smart way (eg. you can configure certain boxes to be web servers and have it run a script on all web server boxes). It does have some weird config format but it does also allow you to run scripts.
[+] justizin|10 years ago|reply
This is poetic:

  "Kubernetes has a Clintonesque inevitability to it"
:-P