When it gets deployed by the CICD, the right tfvars file is passed in via the -var-file parameter. A standard `env` var is also passed in, and used as a basis for a naming convention. Backend is also set by the pipeline.
The rationale here is that our environments should be almost the same between them, and any variations should be accomplished by parameterization.
Modules are kept either in separate repos, if they need to be shared between many workspaces, or under the `modules` subfolder.
Yeah I'm confused why none of the solutions presented deploy the same TF across the environments: Surely if you have dev vs prod, 90% of dev infra is also needed in prod?
Then sure sprinkle a few per-env-toggles `count: 1 if env == dev else 0` (or whatever the latest nicest way to do that is today), but it feels weird to have to set up an artificial TF module around our entire codebase to be able to share the bulk of the code?
This is terrible. All options seem to assume state is stored locally (not, say, in S3). Many options are just the same as others but only one environment, or with some project-specific difference like 'there is a backend and a frontend' or 'a few services' which has nothing to do with how you structure the terraform side of it.
All of them either don't address or use multiple directories to handle multiple environments (and assume a static list). What?! No, use terraform workspaces, or a single module that you instantiate for each environment. Or terragrunt if you really want. TFA is just a blogspam mess from either AI or someone that's spent about 20 minutes with a Youtube video to learn terraform & push out their latest deep dive content guys.
The entirety of this research was about structure of directories.
Storing state in S3 or TFC or Spacelift or somewhere else is out of scope. S3 is where 90% of the world stores their state and writing those configuration lines is not in scope. You can find other resources on that.
I struggled to find an exhaustive list of how people manage their directory structures and hence the focus of this piece.
If you’d like to provide constructive feedback and avoid comments regarding scope creep, please share.
Composable modules such as `terraform-aws-lambda/modules/standard-function`, `terraform-aws-iam/moduls/role-for-aws-lambda`, etc, which get composed for a specific usecase in a root module (which we call stacks). The stack has directories under it such as `dev/main/primary/` `dev/sandbox-a/primary/` `dev/sandbox-a/test-a/`, etc, where `dev` is environment, `main/sandbox-a` is tenant and the `primary/test-a` is the namespace. The namespaces contain a `tfvars` file and potentially some namespace specific assets, readme's, documentation, etc. The CD system then deploys the root module for each namespace present.
Stacks are then optionally (sometimes deeply) nested under parent directories, which are used for change control purposes, variable inference and consistency testing.
OpenTofu >1.8.0 is required for all of this to keep it nice and tidy.
I didn't understand the benefit of using terragrunt. Modern terraform supports all of the features terragrunt was originally designed to work around way back when.
Outside of being able to use variables in very niche places that you can't in terraform (and can easily work around, and that last I heard is on the road map for open tofu), what does terragrunt do that using regular module imports in terraform don't?
This may be anecdotal but every terragrunt repository I've ever seen was a mess of spaghetti trying too hard to stay DRY.
I spend a lot of time speaking with clients and have found myself partially understanding organizational structure so I dove in to collect my thoughts and put myself closer to the customer on what they are navigating.
This gave myself a refresher on how they are organizing their cloud infrastructure within their source control systems. I took a lense from the world of terraform since that’s mostly the world i live in today and the last few years.
I explored 10 different ways to structure your Terraform config roots, each promising scalability but delivering varying degrees of chaos. From single-environment simplicity to multi-cloud madness, customers are stuck navigating spaghetti directories and state file hell.
I probably missed things. Might have gotten things wrong. Take a look and let me know what you think.
Nice! We'll link to this for our internal consultancy work.
It'd be nice to show the other dimension of the git branching strategies to apply. Github flow/feature-branches vs per-env branches of main vs git flow. How and when to apply changes in different environments - before vs after PRs, etc.
The client I'm contracted to is all-in on Terraform Cloud. (TFC)
TFC uses workspaces, which annoyingly aren't the same thing as terraform workspaces. I've divided up our workspaces into dev, qa, staging, and prod, and each group of workspaces has the OIDC setup to allow management of a specific cloud account. So dev workspaces can only access the dev account, etc etc. Each grouping of workspaces also has a specific role that can access them. Each role then has its own API key.
The issues I've run into are mostly management of workspace variables. So now I have a manager repo and matching workspace that controls all the vars for the couple hundred TFC workspaces. I use a TFC group API key for the terraform enterprise provider, one provider per group. This prevents potential mistakes where dev vars could get written to qa, etc etc.
Workspace variables are set by a single directory of terraform, so there's good sharing of the data and locals blocks.
I use lists of workspaces categorized by "pipeline deployers" and "application resource deployers", along with lists of dev, qa, staging, and prod workspaces. I then use terraform's "setintersection" function to give me "dev pipeline" workspaces, "prod app" workspaces, etc. I also do the same with groups of variables, as there's some that are specific to pipeline workspaces, and so on. It works well, and it's nice to have an almost 100% terraform control of vars and workspaces.
I split app and pipeline workspaces based on historical decisions, I'm not sure if I'd replicate that on a new project. The workflow there is that an app workspace creates the resources for a given deployment, then saves pertinent details to a couple of parameters. The pipeline workspace then pulls those parameters and uses them to create a pipeline that builds and deploys the code.
Unfortunately I can't share code from this particular setup, but I do intend to write about it "someday".
This is all besides the point that Terraform's biggest weakness is refactoring large workspaces into multiple smaller workspaces. Transitioning IDs from one workspace to another, at scale, is annoying to say the least. The only remotely feasible generic solution here would be to treat statefiles as tables, write migrations as SQL, and use pre-existing tooling for database migrations and rollbacks... Maybe I'll write something like that someday.
The addition of `import` and `removed` blocks make this a lot easier to manage than it was a year or two ago. You can manage the migration in the Terraform code rather than having to run separate state management commands.
But the con of saying versioning is tricky across modules is damn near impossible to reliably manage.. especially because if I'm introducing a new variable to a shared module A) I need to also add this variable in the inputs of each of the environment.
I haven't found a way to manage multiple versions of the modules across environments if all using the same shared modules. Is it even possible?
Good overview of potential options. Which is most appropriate really depends.
I am a big fan of modularisation, it is possible to extend this approach to divide logically your infrastructure and mirror that by separating out the terraform state files too.
Number of TF deployment increases but they each have smaller blast radiuses and you now need to manage making available outputs of builds to those which are dependent on them.
Python deterministically generating terraform HCL files based on yaml.
Execution wrappers that encapsulate terraform in CI/CD to parse the json output and prevent database deletion, but apply everything else.
Scripts that pull every git repo and execute every terraform file they can find while walking the directory tree.
Terraform is about 80% of the way to a good tool, that last 20% is a ball-ache and solved totally differently every time; the best setups I’ve seen is where terraform just “hands off” to something else after making a minimum infrastructure.
Terraform is such a weird product and it’s hard to describe why, really. It has really awesome things about it like a functional way to pull in support for just about any providers crazy thing. And the crazy part is somehow terraform can easily configure local infrastructure, cloud infrastructure on almost any host, set up repos in GitHub, and then create a new auth provider in Auth0. The language is flexible enough to support all of it.
Yet as a language, it’s quirky as heck. For example how modules are basically wrappers on providers and how different modules can all most “see inside” other modules to iron out dependency ordering but yet also can’t. And speaking of, circular dependencies suck to work around in a modular way without tearing half your structure apart.
Like I said, I am not anywhere close to an expert on terraform and can only describe my limited experience building a fairly simple stack on top of it. The whole thing is just… both amazing and also weird and a bit frustrating. And I have yet to “grow” into multiple environments… lots of my complaints are probably down to my limited experience with it and, honestly, not much out there in terms of best practices for maintaining scalable configuration (or maybe my ADD brain refuses to dive into that, who knows?)
My last adventure into infrastructure as code was with Puppet and Salt. All of that was provisioning on top of bare metal. It was all file operations and the “provider specific modules” were really just wrappers to nicely encapsulate things like nginx or apt. Perhaps it is because of Puppet or Salt’s much more limited scope that didn’t have me feeling the same way.
I mean terraform can be used to configure just about anything that has an API if you wanted. Maintaining a declarative language around that is bound to have its quirks.
iliaxj|1 year ago
The rationale here is that our environments should be almost the same between them, and any variations should be accomplished by parameterization.
Modules are kept either in separate repos, if they need to be shared between many workspaces, or under the `modules` subfolder.
based2|1 year ago
https://www.weekly.tf/
FrenchyJiby|1 year ago
Then sure sprinkle a few per-env-toggles `count: 1 if env == dev else 0` (or whatever the latest nicest way to do that is today), but it feels weird to have to set up an artificial TF module around our entire codebase to be able to share the bulk of the code?
OJFord|1 year ago
All of them either don't address or use multiple directories to handle multiple environments (and assume a static list). What?! No, use terraform workspaces, or a single module that you instantiate for each environment. Or terragrunt if you really want. TFA is just a blogspam mess from either AI or someone that's spent about 20 minutes with a Youtube video to learn terraform & push out their latest deep dive content guys.
bloopernova|1 year ago
carty7|1 year ago
Storing state in S3 or TFC or Spacelift or somewhere else is out of scope. S3 is where 90% of the world stores their state and writing those configuration lines is not in scope. You can find other resources on that.
I struggled to find an exhaustive list of how people manage their directory structures and hence the focus of this piece.
If you’d like to provide constructive feedback and avoid comments regarding scope creep, please share.
TYMorningCoffee|1 year ago
lijok|1 year ago
Composable modules such as `terraform-aws-lambda/modules/standard-function`, `terraform-aws-iam/moduls/role-for-aws-lambda`, etc, which get composed for a specific usecase in a root module (which we call stacks). The stack has directories under it such as `dev/main/primary/` `dev/sandbox-a/primary/` `dev/sandbox-a/test-a/`, etc, where `dev` is environment, `main/sandbox-a` is tenant and the `primary/test-a` is the namespace. The namespaces contain a `tfvars` file and potentially some namespace specific assets, readme's, documentation, etc. The CD system then deploys the root module for each namespace present.
Stacks are then optionally (sometimes deeply) nested under parent directories, which are used for change control purposes, variable inference and consistency testing.
OpenTofu >1.8.0 is required for all of this to keep it nice and tidy.
carty7|1 year ago
dayallnash|1 year ago
spicyusername|1 year ago
Outside of being able to use variables in very niche places that you can't in terraform (and can easily work around, and that last I heard is on the road map for open tofu), what does terragrunt do that using regular module imports in terraform don't?
This may be anecdotal but every terragrunt repository I've ever seen was a mess of spaghetti trying too hard to stay DRY.
carty7|1 year ago
This gave myself a refresher on how they are organizing their cloud infrastructure within their source control systems. I took a lense from the world of terraform since that’s mostly the world i live in today and the last few years.
I explored 10 different ways to structure your Terraform config roots, each promising scalability but delivering varying degrees of chaos. From single-environment simplicity to multi-cloud madness, customers are stuck navigating spaghetti directories and state file hell.
I probably missed things. Might have gotten things wrong. Take a look and let me know what you think.
What patterns are you using that I missed?
jjayj|1 year ago
This is split over hundreds of microservice repositories, each of which maintains its own Terraform.
We don't read state from other Terraform deployments, and use published reusable modules when convenient and a tfvars file for every deployment.
At this point I can't imagine doing Terraform any other way.
unop|1 year ago
It'd be nice to show the other dimension of the git branching strategies to apply. Github flow/feature-branches vs per-env branches of main vs git flow. How and when to apply changes in different environments - before vs after PRs, etc.
bloopernova|1 year ago
TFC uses workspaces, which annoyingly aren't the same thing as terraform workspaces. I've divided up our workspaces into dev, qa, staging, and prod, and each group of workspaces has the OIDC setup to allow management of a specific cloud account. So dev workspaces can only access the dev account, etc etc. Each grouping of workspaces also has a specific role that can access them. Each role then has its own API key.
The issues I've run into are mostly management of workspace variables. So now I have a manager repo and matching workspace that controls all the vars for the couple hundred TFC workspaces. I use a TFC group API key for the terraform enterprise provider, one provider per group. This prevents potential mistakes where dev vars could get written to qa, etc etc.
Workspace variables are set by a single directory of terraform, so there's good sharing of the data and locals blocks.I use lists of workspaces categorized by "pipeline deployers" and "application resource deployers", along with lists of dev, qa, staging, and prod workspaces. I then use terraform's "setintersection" function to give me "dev pipeline" workspaces, "prod app" workspaces, etc. I also do the same with groups of variables, as there's some that are specific to pipeline workspaces, and so on. It works well, and it's nice to have an almost 100% terraform control of vars and workspaces.
I split app and pipeline workspaces based on historical decisions, I'm not sure if I'd replicate that on a new project. The workflow there is that an app workspace creates the resources for a given deployment, then saves pertinent details to a couple of parameters. The pipeline workspace then pulls those parameters and uses them to create a pipeline that builds and deploys the code.
Unfortunately I can't share code from this particular setup, but I do intend to write about it "someday".
carty7|1 year ago
solatic|1 year ago
Uvix|1 year ago
bilekas|1 year ago
> Multi-Environment Setup with Shared Modules
But the con of saying versioning is tricky across modules is damn near impossible to reliably manage.. especially because if I'm introducing a new variable to a shared module A) I need to also add this variable in the inputs of each of the environment.
I haven't found a way to manage multiple versions of the modules across environments if all using the same shared modules. Is it even possible?
moredhel|1 year ago
Define a default which is backwards compatible.
carty7|1 year ago
NomDePlum|1 year ago
I am a big fan of modularisation, it is possible to extend this approach to divide logically your infrastructure and mirror that by separating out the terraform state files too.
Number of TF deployment increases but they each have smaller blast radiuses and you now need to manage making available outputs of builds to those which are dependent on them.
TYMorningCoffee|1 year ago
dijit|1 year ago
Python deterministically generating terraform HCL files based on yaml.
Execution wrappers that encapsulate terraform in CI/CD to parse the json output and prevent database deletion, but apply everything else.
Scripts that pull every git repo and execute every terraform file they can find while walking the directory tree.
Terraform is about 80% of the way to a good tool, that last 20% is a ball-ache and solved totally differently every time; the best setups I’ve seen is where terraform just “hands off” to something else after making a minimum infrastructure.
But, otherwise, it can get incredibly messy.
cruffle_duffle|1 year ago
Yet as a language, it’s quirky as heck. For example how modules are basically wrappers on providers and how different modules can all most “see inside” other modules to iron out dependency ordering but yet also can’t. And speaking of, circular dependencies suck to work around in a modular way without tearing half your structure apart.
Like I said, I am not anywhere close to an expert on terraform and can only describe my limited experience building a fairly simple stack on top of it. The whole thing is just… both amazing and also weird and a bit frustrating. And I have yet to “grow” into multiple environments… lots of my complaints are probably down to my limited experience with it and, honestly, not much out there in terms of best practices for maintaining scalable configuration (or maybe my ADD brain refuses to dive into that, who knows?)
My last adventure into infrastructure as code was with Puppet and Salt. All of that was provisioning on top of bare metal. It was all file operations and the “provider specific modules” were really just wrappers to nicely encapsulate things like nginx or apt. Perhaps it is because of Puppet or Salt’s much more limited scope that didn’t have me feeling the same way.
I mean terraform can be used to configure just about anything that has an API if you wanted. Maintaining a declarative language around that is bound to have its quirks.
michaelmcmillan|1 year ago
For environment specific things use conditionals: