The architecture of declarative configuration management

[+] gorgoiler|6 years ago|reply

Building your own version of something is surely self indulgent wheel reinventing, but that’s what I’m currently doing with distributed configuration management.

It’s certainly been helpful in terms of understanding the boundaries between parts of the system, as this post also describes. The desire to auto configure everything is strong — one day you’ll have a VLAN hard coded into the config, but the next day you’ll be trying to programmatically distribute VLAN ids based on function instead. The day after that VLANs themselves are a artifact generated from a higher level separation in your human readable config. What was once a list of hosts with an attached VLAN id is now a group of hosts with a declared function that just happens to be programmatically assigned a VLAN id, but only as an implementation detail.

The same happens with IP address management — your root configuration moves closer and closer to being a document describing what you want to do, and less about how to go about doing it (which is implemented in your custom augmentations to the engine instead.)

When you can justify it as an exercise in understanding a system, and you have time for it, building your own tool chain is incredibly rewarding.

[+] paulddraper|6 years ago|reply

> We could imagine resolving this tension if Terraform had two different convergence engines...The “create a new environment” engine, which always creates from scratch every resource it was given. This would excel at spinning up fresh environments as quickly as possible, since it would have to perform a minimum of introspection or logic and would just issue a series of “Create()” calls.

This just doesn't make sense; introspection usually allows you apply changes more quickly. For example, it takes seconds to describe and update an existing AWS ELB; it takes minutes to delete and create a new one.

If you really want to forgo analysis and reuse of existing infrastructure, just do

    terraform destroy
    terraform apply

> Importantly, however, it by design will never issue a destructive operation, and will error out on changes that cannot be executed non-disruptively.

The notion of a "destructive operation" is not clear cut. Is it destructive to remove a file from S3? To update a file in S3? To delete a tag on an S3 bucket? To update a tag on an S3 bucket?

You can just manage this with permissions; that way you can specify exactly what is and isn't an allowable operation. In fact, this is best practice as it protects against bugs or misuse of the tool. Since Terraform already defaults to non-destructive, adding infrastructure-level permissions would cause it to work exactly as described.

A better example of customizable convergence would be the lifecycle management options Terraform already has, such as create_before_destroy which ensures the new resource exists before the old one is deleted.

[+] purpleidea|6 years ago|reply

I'm working on something called mgmt: https://github.com/purpleidea/mgmt/

It runs as a distributed system, and is reactive to events, both in the engine and in the language (a FRP DSL) which allows you to build really fast, cool, closed-loop systems.

Have a look!

[+] bandrami|6 years ago|reply

Seems odd to talk about declarative configuration management and not mention NIX or GUIX.

[+] equalunique|6 years ago|reply

I was going to comment the same thing. On the NixOS About Page[0], the first main section is "Declarative system configuration model"

[0] https://nixos.org/nixos/about.html

[+] di4na|6 years ago|reply

It sounds like you want something closer to a prolog language in which you could specify the rules for the engines to respect...

[+] pjbk|6 years ago|reply

Exactly, and structural and functional constraints over properties and rules.

I understand the need to reinvent the wheel, but most of these efforts feel to me like customizations that most declarative languages can provide, albeit possibly in a non-intuitive syntax.

[+] ratiolat|6 years ago|reply

Salt, for some reason, is not discussed in the article. It's declarative.

[+] leg100|6 years ago|reply

He's spot on about separating "configuration generation" from convergence. There is no reason for the two to be the same system, the same tool. As he says, Kubernetes is only concerned with the latter, whereas Puppet, Chef, and Terraform conflate the two (insofar as it uses HCL).

And for all the talk of "declarative", there is no reason why the configuration generation stage cannot be imperative, a la Pulumi. It is the desired end state - the catalog that's being generated - that is declarative.

[+] _frkl|6 years ago|reply

I mostly agree, with the caveat that in my experience, if the configuration generation stage is entirely imperative it is harder to reason about it. That might not be a problem for low-complexity setups, but can get quite important (and bad) in some more involved cases.

[+] jcollins|6 years ago|reply

The "pluggable convergence engines" is what we've built in Gyro[1] for this very reason. We wanted to have more control over how changes are made in production.

An example is doing blue/green deployments where you want to build a new web/application layer, pause to validate it (or run some external validation), then switch to that layer and deleted the old layer. All while having the ability to quickly roll back at any stage. In Gyro, we allow for this with workflows[2].

There are many other areas we allow to be extended. The language itself can be extended with directives[3]. In fact, some of the core features like loops[4] and conditionals are just that, extensions.

It's also possible to implement the articles concept of "non-destructive prod" by implementing a plugin that hooks into the convergence engines (we call it the diff engine) events and prevents deletions[5].

We envision folks using all these extension points to do creative things. For example, it's possible to write a directive such as "@protect: true" that can be applied to any resource and would prevent it from ever being destroyed using the extension points described above.

[1] https://github.com/perfectsense/gyro [2] https://gyro.dev/guides/workflows [3] https://gyro.dev/extending/directive/ [4] https://gyro.dev/guides/language/control-structures.html [5] https://github.com/perfectsense/gyro/blob/master/core/src/ma...

[+] xmly|6 years ago|reply

That is why immutable infra becomes popular. You could easily destroy and rebuild the whole thing.

And for Prod env, what are discussing sounds like update behavior for me. In cloudformation, you could choose the different update policies.

Comparing with each cloud's provisioning engine, cloudformation/gcloud deployment manager/azure resource manager, terraform is lacking a lot of features. So unless you are dealing with a private cloud, using cloud default provisioning service is a no-brainer.

[+] cosaquee|6 years ago|reply

Cloud formation is after lacking support for resources that are new or less popular. Terraform is much better in this, supports most of the resources from the start as far as I know

[+] billsmithaustin|6 years ago|reply

My experience is that the production operations engine is hard to get right because your target environment can drift from the desired configurations for reasons that you did not anticipate.

20 comments