top | item 2299409

Why Puppet Should Manage Your Infrastructure

78 points| twampss | 15 years ago |engineyard.com | reply

40 comments

order
[+] donw|15 years ago|reply
I have one big gripe about Puppet:

Puppet requires both a puppet server (Rails) and client, SSL key exchange, firewall rules for the puppet server, proper DNS records for everything, and a host of dependencies, all of which you need to set up before you can actually do anything.

Any system management solution that requires anything more than a bare machine with SSH and sudo is, in my book, not terribly practical, because these are the lowest common denominator on what you'll get from any hosting provider or OS install.

In a nutshell: I shouldn't need to configure my servers before I configure my servers.

[+] dan_manges|15 years ago|reply
You can use Puppet without doing all of that. At Braintree, we use Capistrano to upload a tarball of our Puppet scripts and run the puppet command.

sudo puppet --templatedir $HOME/puppet/templates --factpath $HOME/puppet/facts puppet/puppet.pp

[+] asenchi2|15 years ago|reply
Read the bottom of the article, I mention that you can simply run 'puppet apply' against a manifest (for versions > 0.24.X). Grow from there.
[+] wfarr|15 years ago|reply
Given that, I'd definitely recommend you check out Rump: https://github.com/railsmachine/rump

It'll let you manage your servers in a headless manner, using simple version control via git to manage your puppet configuration.

[Disclaimer: I work for Rails Machine.]

[+] Goladus|15 years ago|reply
Puppet requires both a puppet server (Rails) and client, SSL key exchange, firewall rules for the puppet server, proper DNS records for everything, and a host of dependencies, all of which you need to set up before you can actually do anything.

Not to join the chorus of "that's not actually true" I do want to take a step further and say that's not even the way I'd recommend using puppet.

The puppet server gives you an alternate authentication method (SSL vs SSH is a toss-up imho), a fileserver (rsync is better), a "dumb client" model where clients are only given the files and configs they actually need, and a master server to process all the manifests and load them into memory, etc., which might under some circumstances (which I have never encountered) help performance. That's about it. If anyone else has other insights on the benefits of deploying a master I'd be happy to hear them.

If you don't need any of those things, you don't ever need to deploy a puppet server. You need ruby and its dependencies, you need facter and its dependencies, and possibly a couple of other libraries (ruby-augeas, etc.), most of which are built in to modern distributions. Ideally you will use revision control as soon as possible. Rsync your manifests and run the puppet client on them directly. That method scales up or down really well and is generally more flexible (see the comment below about difficulty testing changes)

[+] DougBarth|15 years ago|reply
We've been using Puppet to manage our servers for some time now. As a group of developers doing our own operations work, we've found puppet both good and bad.

Setting up puppet was relatively straight forward. We had the puppetd auto-updating our servers for a while, but ultimately decided to manually run it when deploying changes. Managing zero-downtime changes was more error prone with it running.

Some aspects of Puppet have over time proved frustrating to us. The top annoyance is we never quite figured out a good way to test our puppet changes before checking them into git to deploy them to our puppetmaster. That has lead to a number of "fixing errors" type commits. The second annoyance we've found is actually highlighted as a feature: no implicit ordering of operations. While it might sound great to be able to reorganize your configs without fear of breaking the deployment, we've found that the tradeoff is that you don't find out that your configuration doesn't define its dependencies correctly until you try to kick a new server after spending months incrementally adding to your existing servers. For us, at least, having an implicit top to bottom ordering would lessen that headache.

Despite some of these headaches, simply having our configuration in version control is a huge win for us. We can setup a box much more easily, and we have a comment trail of why changes were made.

[+] Goladus|15 years ago|reply
About testing puppet changes...

If I had to do it again I would probably ditch the puppetmaster altogether and use an rsync server to distribute the entire configure repository to every server, then run puppet locally to apply changes. This way you can simply modify any local repository and run puppet to apply the desired configuration to any machine you want. When you're happy with the changes you can check them in.

Using the puppetmaster and the puppet fileserver was trickier, essentially I would use FACTER_var="value" to pass in a value to puppet that would use local files rather than central files (which came pretty close to the purely decentralized model anyway).

[+] wfarr|15 years ago|reply
You may be interested in checking out Rump: https://github.com/railsmachine/rump

[Disclaimer: I work for Rails Machine.]

It's essentially a headless puppet that centers around a workflow of testing changes from an individual checkout of the puppet code on a target server, testing no-op applies of the manifests, and applying the manifest until you're happy enough to commit, push, and roll-out.

This won't help with your second annoyance, sadly, but it should definitely help with the first in quickly pinpointing these sorts of issues without having a messy commit history.

[+] asenchi2|15 years ago|reply
Regarding your troubles with "ordering of operations". I've found it varies on installation, but that each team works to avoid these issues by setting a standard on module development. So that when you "include 'ntp'" you know exactly what you are getting. I've seen many different ideas on how to accomplish this, all of which made it really easy to include without ramifications.

Also, regarding testing. I think this is an issue with both Chef and Puppet. Something I hope someone addresses at some point in the future. I've seen some custom tools with some promise (Chef focused) but perhaps a Vagrant setup might be the best answer these days.

[+] nwmcsween|15 years ago|reply
You know what I can't even express the amount of dislike I have for puppet from variables that have 4 purposes (who ever thought up :ensure that can be "latest", a version, a requirement of being available or not or if a service is running needs to be shot buried and encased in cement) to the DSL that tries to be declarative yet puppet isn't and allows for half installs due to failures. Chef isn't any better as it's extremely opinionated - AMQP have to use it, deprecated merb have to use it. Cfengine is in it's own world of suck (ever write unportable scripts with no abstractions? well you do now). I'm not being snarky, I gave each a fair shot while evaluating them by implementing a provider for a distribution.
[+] legooolas|15 years ago|reply
There have been a lot of suggestions (on here and other sites) to run puppet with locally-rsync-ed (rsunk?) copies of manifests, but there are a few things which won't work if you do this, unfortunately. Most importantly is the `storedconfigs' which (afaict) require the puppet server to work.

This means that you lose a large amount of the power of using puppet, by which you can use configurations across machines to do things like collect up all the services you run on a set of machines and generate a nagios config, or firewall config, or whatever. Without using stored configs for this I assume it's possible but will require more explicit configuration rather than the rather more elegant solution provided when using a puppet server.

Side note : I've used puppet on a fairly small scale of up to ~50 machines, and just started using it for VMs, and it's pretty straightforward to integrate into a bootstrap install to get ruby and puppet installed so that you can use it to install all the rest of the dependencies. But of course, the most use is for changes later on rather than at install-time when there are already a huge number of tools to set up or image machines or whatever.

Side side note : I've not used Chef to compare this with.

[+] mtrn|15 years ago|reply
I'm setting up a single server, and even there, puppet and chef come in very handy. I can reuse the recipes on a local vagrant-managed virtual box OS and test both the server configuration and the deployment.

At the moment I like chef-solo a bit better (because it uses an internal dsl).

Just because I beginning puppet standalone and chef-solo - are there some longer term experiences, pitfalls, etc, you can share?

[+] asenchi2|15 years ago|reply
I'll write up more about Chef later, but I really look at the two differently. Puppet is really great at managing infrastructure and server state. Chef is really good at integrating with your application (especially if you are using Ruby). I typically think of Chef as a framework to program your infrastructure against. Puppet is more of the middle manager. :)

Both are easy to test, with Puppet winning slightly with 'puppet apply <manifest>'. Chef-solo is nice but takes a little bit more to setup (solo.rb and node.json for example). Either way, test and see what you think will work best for you.

[+] thwarted|15 years ago|reply
We use puppet at yelp. It's okay, but not perfect (we're using 0.25 on a mix of Centos and Ubuntu). Here are some gotchas and pitfalls I've run into:

It uses tremendous amount of memory, both the puppetd clients and the puppetmaster server. We were experiencing regular crashes (unrelated to memory usage, AFAICT), when we were on 0.24, that we have init/upstart/ubuntu-process-management-du-jour manage it.

Puppetmasters seem to stop responding and (from what I can tell from lsof) forget about some file descriptors every so often, and we need to hard-restart them, usually using kill -9.

There isn't solid support for distributing files via any method other than the puppet:// scheme (although supporting http is in the works), which means puppetmaster must both evaluate the configuration and serve files, and it doesn't seem like a very efficient when serving files.

The documentation is less than stellar. Valid examples are not included, and there are exceptions to exceptions in the DSL. For example, the defined() function determines if a class or resource has been defined; for resources you do defined(ResourceType[title]), and for classes you do defined("class::name") (defined(Class['class::name']) doesn't work here, even though you specify dependencies using Class["class::name"] syntax). I had to find this out by digging deep in the bug tracker and mailing list. I find the documentation difficult to navigate, there's no unified "here's the syntax" document, and there aren't enough indications of which version of puppet supports which language constructs.

The certificate management is extremely subpar. By default the puppet clients connect to a host named puppet. But the puppetmaster generates certificates with a CN of the hostname of the puppetmaster. This made setting up multiple, interchangable, load balanced puppetmasters problematic -- the puppet clients then complain that the server identity changed between runs. The CN of the puppetmasters should be "puppet". There are options to override the CN and the Alternative Names when the CA and PM certs are generated, but we had trouble getting them to work -- figuring out the problem was easy once you realize the fields in the certificates were always being generated wrong. We had to settle on generating a puppetmaster certificate once with the right values, then copying that to all our puppetmasters (really, this is how you manage SSL a cluster of web servers, you don't have a certificate for each web server with its own hostname in the CN, you have one for *.example.com or www.example.com and every server serves that name). We also had to turn on autosigning and we clean out the certificate store on the puppetmasters periodically to avoid certificate signing conflicts between puppetmasters. The SSL is a nice feature, and I definitely see it as a necessity for security purposes, but it could be cleaner.

You definitely need multiple puppetmasters if you have a largish environment. I don't consider our environment especially large, but we've had load issues when we ran one puppetmaster. Even distributing the puppet runs using the splay option didn't help.

A guy on my team wrote a function to recursively template a directory of files. This made mass file management easier, otherwise you need to specify each file individually in a file {} stanza.

We have scripted setting up a puppetmaster and a puppet client, and modified the default (I believe ubuntu provided) init.d script to give the command line options related to the next point...

I had issues with the defaults specified in the puppet.conf (and puppetd.conf and puppetmaster.conf or something) and the section names in the files (they are in .ini format), and getting the command line to override them. It's been a while since I had to deal with this (since we worked around it), but there's a thread at http://www.mail-archive.com/[email protected]/msg0... about the --config command line option. Related to this, we run puppetmaster with a config dir of /etc/puppetmaster and a vardir of /var/lib/puppetmaster. This had made things a lot easier; by default, everything goes in /etc/puppet and /var/lib/puppet, and the files for the puppet client and the puppetmaster get mixed in together when running puppet on the puppetmaster. Since we've scripted both client config and puppetmaster config, it's easy to just blow one away and recreate it.

We didn't used custom facter facts or custom functions on the puppetmaster initially, but I recently setup our environment to support those and if you know ruby (or can muddle through it), it's reasonably easy to extend the capabilities.

We mainly use it to distribute files and create user accounts, we've had problems on and off with anything else more advanced (even service management has been a problem at times--things stopping and starting when they shouldn't--but I attribute this to general issues with ubuntu moving to different versions of upstart at times). Having modules that do things like manage apache config, or sudoers or nagios config might come in handy if you started with it using puppet, but when you're moving an already established config to puppet, it's easier to just distribute the files. Especially when distributions like ubuntu (debian?) support a subdir of config files for apache that are managed with symlinks.

I don't mean to present it like it's all bad. It has allowed us to centralize and version most of our config and bring new machines into service relatively fast. We were throwing around using it to configure EC2 instances, but really, I think it would be easier (and faster) to use custom AMIs. We have not had to do this yet, though.

Some of these issues may be fixed in 0.26, we have not gotten around to playing with it yet.

So it has its quirks, and it's not so bad if you really spend time learning it and have enough experience to come up with work arounds for where you'll experience pain points -- this is no different than any other software package. Considering it's what I know and I'm aware of the quirks, I'd use puppet on other networks. And I'm sure some of the problems we've had with it are because we're doing something non-standard or using it in a unique or unsuggested way.

I really should write up some of our recipes to help out other people.

[+] _b8r0|15 years ago|reply
Puppet is one of those things I like in principle but is too much of a PITA to set up. Take for example the class definitions, the class definitions don't appear to offer a great deal more than a shell script, but in the example shown in TFA for the price of about 6 lines of puppet code we could've just run rsync -e ssh -avz ntpd.conf puppet@server:/etc/ntpd.conf && chown root:root /etc/ntpd.conf && chmod 644 /etc/ntpd.conf.

Of course in the real world you'd have a tarball you'd rsync over, then use SSH to extract and run the base script and robert's your father's brother. A lot simpler and the way I'd automated Solaris admin years ago. Puppet's drawback is that it doesn't offer anything sufficiently compelling for people to change from what they use, and presents an awful lot of work in it's syntax for people getting started. Once it's up and running it's brilliant, I've seen it. But it just seems like so much hard work to get there it's like a barrier to entry.

[+] Goladus|15 years ago|reply
The original example is actually kind of bad and doesn't demonstrate puppet's abstraction facilities.

You can define a custom resource, for example "system_file" that provides a default "root:root, mask 444" permissions such that you just have to define a source and destination for every file, overriding the default permissions when you want.

define system_file($mode = 444, $owner = root, $group = root, $content = '', $source = '', $ensure = 'present') {

        file { $name:
                owner => $owner,
                group => $group,
                mode => $mode,
                ensure => $ensure
        }

        if $source != '' { File[$name] { source => $source } }
        if $content != '' { File[$name] { content => $content } }

 }
So in the article's example, it would look like this:

        system_file { "/etc/ntp.conf":
            source => "puppet:///ntpd/ntp.conf",
            require => Package["ntp"]
        }
Or even

        system_file { "/etc/ntp.conf":
            source => "$configfiles/ntpd/ntp.conf",
            require => Package["ntp"]
        }
Where $configfiles might be the puppet server or some other location. One of the things you get with puppet is access to any of the local host properties that can be discovered with facter, so you can dynamically configure something like a source file.
[+] vacri|15 years ago|reply
"price of 6 lines of puppet code" = 90 characters

price of your example = 105 characters

[+] forsaken|15 years ago|reply
The real point that should be made about Chef and Puppet is that they are so similar, it really doesn't matter which you use. Using one or the other is a much more important choice than which one you use.

I don't know the best way to express this sentiment (feels like there should be a word for it). But really, just use something to automate your infrastructure and your life will be measurably better.

[+] DanielBMarkham|15 years ago|reply
It's been extremely interesting to watch these meta server tools evolve. We're reaching the point where there's not too much of a difference between a scripted network graph and a suite of VMs and cloning abilities. Each technique would have it's advantages, though. Perhaps somebody with large scale infrastructure experience could do a side-by-side comparison?
[+] jollojou|15 years ago|reply
We configure our production servers and push new releases there with Puppet. I like Puppet: its fail-safe and reliable.

There is, however, one thing I don't fancy in it. Puppet does not support insecure client–master communication. Requiring SSL communication is OK, but one should be able to switch it off if it brings no value.

We are running our our servers on AWS, and we rely solely on AWS security groups to grant and deny accesses. Puppet's SSL traffic brings no additional security to us; it only complicates matters. For example: we would like to shut down the Puppet master EC2 instances when they are not needed. However, this is not possible, since after start-up the EC2 instances have new IPs, and this breaks the Puppet-signed SSL certificates.

[+] nimrody|15 years ago|reply
A small question:

Assuming you use a single type and version of the OS (say ubuntu vXXX) -- Does it make sense to use the OS native packaging system instead of something like Puppet?

I.e., maintain a private packages repository where you add your custom packages, and have the various servers pull from that repository?

Obviously, this doesn't work if you have different types of servers - but for many servers configured identically, it should work.

[+] Goladus|15 years ago|reply
Of course, for many different servers configured identically, you could use systemimager instead, which is OS-agnostic.