top | item 7441799

Getting Started with Docker

296 points| fideloper | 12 years ago |serversforhackers.com

I saw that people were looking for better getting started docs for Docker, so I put together the post I wish I found on Docker when I was digging into it.

75 comments

order
[+] Xdes|12 years ago|reply
This skips over the hard part: managing docker containers. Poking a hole directly to the container is a leaky abstraction. A reverse proxy like HAProxy or Varnish should be sitting in front of the container.

Once you have the reverse proxy setup the next problem that arises is routing to containers based on the domain. Now your HAProxy or Varnish config is going to get bloated and every time you deploy a container the config needs to be modified and reloaded. By this time you might be looking at chef or puppet for automating the config generation.

Chef and puppet are not simple to learn. They have their own set of quirks (like unreliable tooling support on Windows). I'm in the process of conquering this, but I hope one day there will be a simpler way.

[+] tomgruner|12 years ago|reply
This is a great point. The initial Docker examples make everything seem easy, but we blew way past our estimated time in integrating docker into our workflow because of the points you mention. I am still happy with the choice to use docker though and our team will be better at server administration in the future.

One thing about this getting started guide is that it recommends the Phusion base image which boots init. That seems to go against the best practices outlined in a recent article by Michael Crosby - http://crosbymichael.com/dockerfile-best-practices-take-2.ht...

[+] vidarh|12 years ago|reply
Update etcd with connection details on container start/stop. Then use a script to watch the appropriate directory in etcd for changes and regenerate the config.

Look at "fleet" from CoreOS, and especially their "sidekick" example that uses systemd dependencies to trigger etcd updates: https://coreos.com/docs/launching-containers/launching/launc... though you can certainly do this without fleet too.

Then on the haproxy/varnish box (or put them in a container), put something that does "etcdctl exec-watch /services/website -- updateconfig.sh", where updateconfig.sh would be a script to watch for changes and regenerate the config / reload.

I don't see how your config will get "bloated" any more than it would otherwise - presumably your number of domains won't increase.

[+] goblin89|12 years ago|reply
> Poking a hole directly to the container is a leaky abstraction. A reverse proxy like HAProxy or Varnish should be sitting in front of the container.

It might be a stupid question but I wonder what's considered a leaky abstraction in this case.

By the way, I'm not sure I fully understand your concerns over reverse proxy routing, but I recall that Ambassador pattern linking[0] is a suggested way of tying Docker containers over network. Also, these slides by dotCloud[1] may be helpful as well (I'm not sure if approaches described are up-to-date, though).

[0] http://docs.docker.io/en/latest/use/ambassador_pattern_linki...

[1] http://www.slideshare.net/dotCloud/deploying-containers-and-...

[+] mtrimpe|12 years ago|reply
I just came here to basically say the same; which I guess is the question shared by 80% of Docker's target market.

I have a box sitting somewhere which, like virtually any dedicated machine, is wildly overprovisioned for it's current usage patterns.

I would like to virtualize my services so that I can one day, when my needs outgrow my box, scale out without having to rewrite any code.

My box has limited IPs available, so I'll need the network between services to be private/internal.

How do I set that up with Docker?

I think it won't be until you can truly easily answer that question that Docker will really take off.

[+] stevekemp|12 years ago|reply
I've been thinking along these lines recently, specifically service discovery for front-end load-balancers.

Most (all?) of the available reverse proxies will stop sending traffic to a server that is offline, but not discover them. There are solutions such as etcd which you can hook into, or you can write a toy application to use UDP-broadcasts to advertise "Hey I'm http://dev.local.com/ on port 4444", but there isn't a lot beyond that.

Templating configuration files and running "haproxy reload" is a common enough middle-ground, but I've seen it fail often. (Specifically keepalived not reloading correctly and still sending traffic to old nodes.)

ObRelated: Varnish is a beast that few people can configure easily. I'd love to work on a caching reverse proxy that was simple, extensible, and fast.

[+] corobo|12 years ago|reply
> Now your HAProxy or Varnish config is going to get bloated and every time you deploy a container the config needs to be modified and reloaded. By this time you might be looking at chef or puppet for automating the config generation.

Varnish at least can route using DNS [0] - You do need a nameserver or two to handle the internal domain of course, but they're reasonably easy to set up using powerdns for example.

[0] https://www.varnish-cache.org/docs/3.0/reference/vcl.html#th...

[+] fideloper|12 years ago|reply
I think you can define the IP address assigned to a container via something like `-p 127.18.0.10:80:80`, if that helps with your HAProxy config (but that assumes your host machine isn't changing as well).

Definitely an interesting issue. Have you seen etcd from CoreOS? Useful for service discovery.

[+] nigelk|12 years ago|reply
What are you finding unreliable in terms of tooling for Puppet on Windows?
[+] izietto|12 years ago|reply
For non-Paas use cases (for example, a development server with a bunch of projects) I find schroot (1) simpler and more productive. For example, you can use the normal `service stop / service start` instead of writing manually init scripts, and you don't get stuck with sharing directories, which I found extremely tricky with Docker (for example, I couldn't start correctly mysql with supervisor sharing the mysql db directory). But Docker is in early development, so I think it will become easier in the future.

1: https://wiki.debian.org/Schroot

[+] vidarh|12 years ago|reply
Here's one benefit you get with docker: Speed of rebuilds, and ensuring that your build instructions and list of dependencies accurately reflects your actual environment. I basically have a "basic dev setup" container for all my projects now, and each new project sits as sub-directories of a directory on the host that I bind mount into the docker containers. Each project then also has a Dockerfile which adds any project specific dependencies.

Building a fresh container then takes a couple of seconds. And the projects run within those containers only. Ever time I restart the apps in my development environment, I rebuild the container, because it is so cheap. Which means I know at any time that the container can be rebuilt to a state the app will run in. I know when I want to deploy that the Dockerfile accurately reflects the dependencies, because otherwise my app wouldn't be running in the dev environment, as any and all changes to anything outside of the application repository are only applied through changes to the Dockerfile.

[+] fideloper|12 years ago|reply
I've had the same issue with MySQL. It's an issue of timing - You install MySQL, and the MySQL data directory has its data/default databases. Then you share the directory with Docker, and the data lib directory is wiped out (the files don't exist on the host machine, after all). Getting it right in an automated way is a Hard Problem™.

As of now, I'm keeping data persisted within the Container, which I don't necessarily like. I would love to hear a good solution on that.

[+] robszumski|12 years ago|reply
CoreOS experience designer here. I'm looking for testers to check out the general platform and test some of our new features. All skill levels are fine – new to docker & CoreOS, new to CoreOS only, etc. I'm happy to work with your schedule and make it as quick or involved as you're comfortable with. Anything from emailing a few thoughts to Skype to hanging out in our office in SF for the day.

Email: [email protected]

[+] markbnj|12 years ago|reply
I've been using docker for a couple of months, but we have only just begun experimenting with actual deployment in a test environment on ec2. Right now we use it primarily as configuration/dependency management. We're a small team and it seems to make setup easier, at least so far. Two examples: the first is a log sink container, in which we run redis + logstash. The container exposes the redis and es/kibana ports, and the run command maps these to the host instance. Setting up a new log server means launching an instance, and then pulling and starting the container. The second example is elasticsearch. We have a container set up to have cluster and host data injected into it by the run command, so we pull the container, start it, and it joins the designated cluster. The thing I like about this is the declarative specification of the dependencies, and the ease of spinning up a new instance. As I say, just experimenting so far, and I don't know how optimal all of this is yet, so would love any feedback.

One last quick thought on internal discovery. A method we're playing with on ec2 is to use tags. On startup a container can use a python script and boto to pull the list of running instances within a region that have certain tags and tag values. So we can tag an instance as an es cluster member, for example, and our indexer script can find all the running es nodes and choose one to connect to. We can use other tags to specify exposed ports and other information. Again, just messing around and still not sure of the optimal approach for our small group, but these are some interesting possibilities.

[+] tonyhb|12 years ago|reply
This is a copy and improvement of the article I wrote last month, even down to the breakdown of "What's that command doing?" with `docker run -t -i ubuntu /bin/bash`.

Glad it was useful enough to spur an improved article, at least.

http://tonyhb.com/unsuck-your-vagrant-developing-in-one-vm-w...

[+] fideloper|12 years ago|reply
I never saw your article before. Sorry, dude. Maybe great minds think a like tho!
[+] yblu|12 years ago|reply
Can someone tell me what's the point of this? (I seriously love to know, not criticizing it.) Why would I need to have docker containers to install stuff on them instead of just installing stuff directly on host?

Let's say I develop a new web app, I would install NodeJS, PostgreSQL and such on my machine. Before I deploy the app for the first time, I'll install them in the necessary servers. Now, it looks like I would need to do the same, except adding the step of building Docker containers.

I think I must miss something important here because the number of GitHub stars for Docker is impressive and this is usually a good indication of the usefulness of the project.

[+] robszumski|12 years ago|reply
Docker containers let you isolate the entire environment for your app. Let's say your running an app on CoreOS in a container that needs python 1.2.3.

On your laptop you can build and test the new version of the app that needs python 1.2.4. Once you decide that's ready to go, you can push the new container onto the same CoreOS machine, so it's running both containers. Without the containers, running two versions of python on the same box isn't possible. If you had a chef script that updated to 1.2.4, you'd possibly break every other app on the box.

Containers also let you do some cool things like sign and verify a container before it's launched on the box. It should be bit for bit the same on your laptop as it is on the remote machine. Containers also boot within seconds, much faster than a VM. There have been a few tech demos running around that actually spin up a new container with a web server to service every web request, just to show how fast you can boot them. 300ms is pretty long for a web request, but it's the idea that counts.

[+] cjbprime|12 years ago|reply
Imagine you wanted to run two apps on the same host, and they depended on different versions of those components, and you wanted to be able to install the app and its dependencies on a new machine really quickly for scaling reasons, and you wanted an exploit on one of the apps to be difficult to escalate over to the other app from, and you'll start understanding some of the pain points Docker solves. :)
[+] njharman|12 years ago|reply
> with Macintosh's kernel

I misread that as "Microsoft's..." and got excited since I run a build farm that's 70% windows and wish I could use docker but it's not worth having two systems (Container and VMs).

Also isn't that complete wrong? Macintosh is not an OS or company. It was one of Apple's product lines, long ago.

[+] zobzu|12 years ago|reply
VM CAN share binaries/libs/etc (otherwise called files)

also, VMs CAN "share" memory. ie VMs can dedup memory between themselves. On Linux at least.

Not saying docker/lxc and all things namespaces are bad at all - but setting things straight. VMs can do this:)

Checkout KSM for memory "sharing" and any overlay-style file system that is mounted by VMs (this one works exactly the same as when you use namespaces/docker/lxc in fact)

[+] arianvanp|12 years ago|reply
Shouldn't "setting up a correct init process" be part of every "getting started with docker?" http://phusion.github.io/baseimage-docker/
[+] bjt|12 years ago|reply
No. That guide assumes you're running a bunch of processes in the container (or even a full system). That's not the case at all when you're doing an "application container" that doesn't need its own cron daemon, ssh daemon, etc.

Containers can be much leaner than the kind discussed there.

[+] jafaku|12 years ago|reply
Wow thanks! I didn't know about the init process and the zombies.
[+] calgaryeng|12 years ago|reply
I wish that people would stop writing tutorials on "getting started" with Docker, and actually start writing up examples of how to work with multiple containers, hosts, and linking.

That's the part that I (and I'm sure other beginners) get totally stuck on. Anyone can do docker commit/pull.

[+] fideloper|12 years ago|reply
I actually haven't found many getting started articles, which is why I wrote this. However I fully plan on writing up more interesting stuff.
[+] netcraft|12 years ago|reply
this is the first time I have heard of coreOS - seems to be custom built for containers like docker. are there downsides to doing system updates this way and not having a package manager, just relying on containers for everything? Seems great in concept.
[+] barrkel|12 years ago|reply
I don't know how well this works as soon as you have a single file that needs multiple edits to support multiple images.
[+] pg_fukd_mydog|12 years ago|reply
Would it be better to use FreeBSD and their Jails mechanism for all of this?
[+] e12e|12 years ago|reply
Joyent would probably claim that kvm+zfs would be best. But if you don't have kernel support for jails, then no, using jails isn't better. It's not an option. Oracle would probably claim solaris zones are better (and arguably, they'd be right).

Jails are (as far as I can tell) great -- but not so great that freebsd didn't include a new hvm assisted hypervisor in freebsd 10 (BSD Hypervisor (bhyve)).

LXC in many ways are *bsd jails for Linux.

[+] izietto|12 years ago|reply
Or schroot on Linux, as I wrote in another comment