top | item 36812848

Rethinking Infrastructure as Code from Scratch

102 points| dguo | 2 years ago |nathanpeck.com

120 comments

order

bob1029|2 years ago

I've been side-stepping this whole conversation by using as little infrastructure as possible. I'd absolutely be digging into IAC abstractions if I had a circus on my hands and no say over why it had to be a circus.

Going into a "cloud native" stance and continuing to micromanage containers, VMs, databases, message buses, reverse proxies, etc. seems absolutely ridiculous to me. We're now using exactly 2 major cloud components per region: A Hyperscale SQL database, and a FaaS runner. Both on serverless & consumption-based plans. There are zero VMs or containers in our new architecture. We certainly use things like DNS, AAD, VNets, etc., but it is mostly incidentally created by way of the primary offerings, and we only ever have to create it 3 times and its done forever and ever - Dev cloud, Prod cloud, DR cloud. And yes - we are "mono cloud", because any notion of all of Azure/AWS/GCP going down globally and not also dragging the rest of the internet with it is fantasy to me (and our customers).

When you literally have one database to worry about for the entire universe, you stop thinking in terms of automation and start thinking in terms of strategic nuclear exchange. Granted, one big thing to screw up is a big liability, but only if you don't take extra precautions around process/procedure/backup/communication/etc.

The benefit of doing more with less also makes conversations around disaster recovery and compliance so much easier. Our DR strategy is async log replication of our 1 database. I really like the abstraction of putting 100% of the business into one place it magically showing up on the other side of the flood event.

How about this for a litmus test: If your proposed solution architecture is so complicated that you would be driven to IAC abstractions to manage it, perhaps we need to re-evaluate the expectations of the business relative to the technology.

llama052|2 years ago

> Going into a "cloud native" stance and continuing to micromanage containers, VMs, databases, message buses, reverse proxies, etc. seems absolutely ridiculous to me.

Honestly you're just paying the cloud provider to manage these things behind the scenes for you. Which is fine but also has its own risks. If you can keep your product that simple for the business then that's pretty incredible.

I do suspect that's not a common situation though, at least in my experience.

nonameiguess|2 years ago

This is a nice perspective from the developer of a single application, but as a platform developer, I'm usually dealing with using IaC tooling to set up multi-tenant environments. I can't just deploy one database because there may be 50 different teams working on 50 different sets of problems, some of them basic research, some of them products, some of them purely exploratory, and there are often legal restrictions on who is even supposed to be able to make a network connection to a particular database, so simply using roles and users built into the DBMS engine itself isn't good enough to achieve the required separation, not to mention they need to be encrypted at rest with different keys. This often needs to be done across separate accounts within the same cloud provider for budgeting and accounting purposes as well, so they couldn't just share a resource even if it was otherwise okay for them to potentially step on each other's work.

codethief|2 years ago

My thoughts have been going into another direction entirely:

- We need to get rid of YAML. Not only because it's a horrible file format but also because it lacks proper variables, proper type safety, proper imports, proper anything. To this day, usage & declaration search in YAML-defined infrastructure still often amounts to a repo-wide string search. Why are we putting up with this?

- The purely declarative approach to infrastructure feels wrong. For instance, if you've ever had to work on Gitlab pipelines, chances are that already on day 1 you started banging your head against the wall because you realized that what you wanted to implement is not possible currently – at least not without jumping through a ton of hoops –, and there's already an open ticket from 2020 in Gitlab's issue tracker. I used to think, how could the Gitlab devs possibly forget to think of that one really obvious use case?! But I've come to realize that it's not really their fault: If you create any declarative language, you as the language creator will have to define what all those declarations are supposed to mean and what the machine is supposed to do when it encounters them. Behind every declaration lies a piece of imperative code. Unfortunately, this means you'll need to think of all potential use cases of your language and your declarations, including combinations and permutations thereof. (There's a reason why it's taken so long for CSS to solve even the most basic use cases.) Meanwhile, imperative languages simply let the user decide what they want. They are much more flexible and powerful. I realize I'm not saying anything new here but it often feels like as if DevOps people have forgotten about the benefits of high-level programming languages. Now this is not to say we should start defining all our infrastructure in Java but let's at least allow for a little bit of imperativeness and expressiveness!

dloreto|2 years ago

I have a similar view to yours: as soon as you need variables, imports, functions or any other type of logic ... the existing "data-only" formats break down. Over time people either invent new configuration languages that enable logic (i.e. cue or jsonnet), or they try to bolt-in some limited version of these primitives into their configuration.

My personal take is that at some point you are better of just using a full programming langugage like TypeScript. We created TySON https://github.com/jetpack-io/tyson to experiment with that idea.

danielvaughn|2 years ago

I was only introduced to terraform a few months ago, but my own takeaway so far closely mirrors yours.

One thing that's sorely needed, especially for beginners, is something like a schema. Something that would provide editors with typical language features, but especially autocomplete. Maybe protobuf would work but I've also heard about some language called Cue that may be worth exploring as well.

I also feel that declarative is the wrong approach. Building infrastructure is inherently imperative and making the build process apparent in the code would go a long way towards readability. I'd love to be able to read through the terraform modules like I'm reading a story about how the system gets built.

oweiler|2 years ago

I think GitHub Actions got it right. Mostly declarative workflow, but with reusable Actions to extend the DSL.

mst|2 years ago

Without suggesting it solves most (if any) of your complaints, https://yglu.io is significantly less horrible than text templating YAML files like everybody else seems to want to do.

twic|2 years ago

Did that guy just suggest that to make infrastructure-as-code easier to understand we should make it more like CSS?

twic|2 years ago

What i think we really need:

1. A low-level, open-ended language for describing infrastructure; it should have absolutely no facilities for abstraction, should be human-legible and machine-readable (so based on JSON, probably), and should be applicable to everything from configuring physical hosts and switches up to containers.

2. For each kind of infrastructure, a tool which can apply that language to the infrastructure; one for AWS, one for physical hosts, one for Kubernetes i suppose, etc.

3. Tools and libraries for producing documents in that language from more expressive, concise sources; could be a YAML-to-language compiler, could be a classic Ruby DSL, could be a Python API, could be this guy's CSS idea, could be a GPT prompt, whatever.

Mostly, i want options for the last part to include libraries in sensible programming languages. Then i can just write real code, with full abstractive power, and the possibility of unit tests etc, to define my infrastructure, run it, and feed the output to the applier tool. No more enterprise YAML engineering. No more trying to shoehorn abstraction into Jinja2 templates. Just normal code.

Because the code produces the language, rather than operating on resource directly, writing a new library / DSL / whatever, based on a cool new model which will solve everyone's problems, becomes very easy. You don't have to build a whole IaC tool from scratch.

It also means you have an obvious and simple checkpoint to apply diffing, linting, security checks, etc. Not on the input code, but on the resulting document.

And it means you have one place you can always look to determine the ground truth of what is going on.

cptskippy|2 years ago

I think the part that upsets me is he specifically calls out the problem CSS was meant to fix, but then presents the current usage of CSS doesn't fix the problem but instead inverts it. Instead of having to scour code for instances of an attribute you wish to change, you're scouring output for potential negative consequences to a 1px margin shift you added to a style used everywhere.

It seems like this will just amplify mistakes when a lowly dev tries to increase the available RAM of their resource and instead doubles the entire RAM allotment of a resource type for the entire enterprise.

NathanKP|2 years ago

Don't get me wrong CSS isn't perfect, but it has done a really good job of scaling through difficult problems as HTML, and browsers, and user expectations have grown over the decades. The tooling and CSS frameworks have gotten really good.

I think HTML + CSS is an example of a declarative system where you can start out not knowing very much about it, just drop in Bootstrap or Tailwind, and start getting great results by using prebuilt CSS classes from someone else.

This is what is missing in most modern infrastructure as code. Sure you can start out with prebuilt IaC templates from someone else, but these templates are basically like getting handed a big chunk of HTML that has inline styles on it. It might render great in the browser and look great, but its hard to read, its hard to understand why it works, and you'll have trouble adding on your changes to it without breaking things.

What I'd like to do is decouple the semantic aspects of infrastructure from the specific configuration aspect, similar to how HTML + CSS lets developers write their semantic markup with semantic CSS class names, and then have a CSS framework provide the exact styles that make it look pretty.

Infrastructure as code needs a similar standard library of semantic configuration mix-ins that you can apply to your infrastructure as layered mutations to produce the final result. There are many tools out there approaching this challenge right now from different ways, and I think the future of infrastructure as code is going to look quite different from what most people are doing today, more like HTML + CSS, or imperative code, than flat YAML and static structures that must list out all their own properties.

naikrovek|2 years ago

I think if you use CSS as a source of lessons learned after the mistakes were implemented and cemented, then yes. If you aim for where CSS was intended, and make decisions which do not compromise that direction, then it's fine goal.

The layering of distinctly-defined concerns contained in separate files which collectively project a merged specification to an IaC tool is a good idea, I think.

jen20|2 years ago

What the author appears to miss is that many existing IAC tools permit exactly this.

CDK, CDKTF and Pulumi all use general purpose programming languages, so reusing parameter objects in the way that is described is trivial - indeed it is so close to second nature that I would not even think to write it down. Indeed, it's not uncommon to share functions that make such parameter objects via libraries in the package ecosystem of your choice.

I agree that IaC needs a rethink, but that is more to do with the fact that declarative systems simply cannot model the facts on the ground without being substantially more complex.

NathanKP|2 years ago

I'm the author of the article. I actually shared a prototype towards the end of the article, of the idea implemented in CDK. I agree that you can do a lot of this already in CDK, CDKTF, and Pulumi. I just don't think most people are actually doing it (yet).

I've been using CDK since early beta, and have actively contributed to the project. But most people that I'm seeing using it today are just wrapping up new higher abstractions with a simpler but limited API. I think that is an okay start, but I want to encourage people to think more about the infrastructure as being made up of traits/classes/adjectives that are mixed together to form the final product. The same way we have class inheritance in object oriented programming, or CSS classes in HTML.

Eventually the dream is to be able to provide a library of standard infrastructure as code mix-ins that can be applied to your cloud architecture. For example imagine if you could apply a generic "Graviton" trait to a CloudFormation or Terraform or Pulumi or CDK stack and it would automatically configure the appropriate properties on your EC2 instances, and your RDS database, and your Fargate tasks, and all your other compute. With CDK's built-in container and image builds it could even run your Dockerfile based build inside of the matching architecture as well, all based on a single trait that you add to your stack.

There are a wide variety of these types of "traits" that you might be able to build and add.

substation13|2 years ago

Adding a "real" programming language makes certain things easier, such as abstraction, but IMO they are too powerful for the task at hand. Do we really want an infrastructure description to be able to execute arbitrary code?

JohnMakin|2 years ago

> I believe that infrastructure as code languages and tool assisted generators that we currently use are good, and they are taking steps in the right direction, but most of them are trying to patch over underlying complexity in a way that is fundamentally unscalable.

Sure, I can get behind this. Yesterday I was trying to figure out how to give a name to EC2 instances generated by AWS-managed autoscaler group that’s created by a node group resource. Simple, right? should just be able to add a Name = $tag field to the node group somewhere to apply to the generated ec2’s?

well, not quite. What you actually need is a separate autoscaling_group_tag resource.

Well, that resource needs a reference to an autoscaling group arn. but I dont manage an autoscaling group, my node group does, so in the end I have to figure out how to reference it like:

aws_node_group.node_group.resources.0.autoscaling_groups.0.arn

well, not quite, you may need a try block around that, and maybe some lifecycle rules to get around weird race conditions.

so yea. I’m not complaining about HCL or terraform. I find it much better than the alternatives. but lots of times my reaction to stuff like this is “there’s no way it has to be like this.”

intelVISA|2 years ago

> “there’s no way it has to be like this.”

It really doesn't, but alas: worship at the altar of unnecessary cloud complexity or be cast out to the on-prem Elysian field.

bovermyer|2 years ago

I don't think the author has tried Pulumi, which can do exactly this kind of thing.

mattpallissard|2 years ago

Or cdktf

> This is a bold statement I know. But I do not believe that infrastructure as code can ever get significantly simpler in its current form

Everything can be made easier to use. Pick the subset of functionality you care about and package it up as a library or module for other teams to use. This was how I paid the bills for years.

brodouevencode|2 years ago

He's suggesting something closer to Pulumi than a declarative (Cloudformation, Terraform), but with more of an inheritance model to apply blanket attributes to the targeted resources. This is possible with Pulumi but requires a lot of boilerplate and some monkeypatching.

whoomp12342|2 years ago

this is great, but I would argue the biggest issue with infrastructure as code is this:

the structure and syntax for AWS is entirely different from Azure is entirely different from GCP.

Instead of abstracting to CSS, I would argue modeling what Bytecode did in java for multi-operating system, we should do for infrastructure of code.

That way, you could easily replicate in different environments, free yourself from vendor lock, and have readability/re-usability all in one.

This is what I want from infrastructure as code and I have yet to see it.

jerf|2 years ago

"This is what I want from infrastructure as code and I have yet to see it."

If you sit down with the terraform specifications for an AWS instance, a GCP instance, and an Azure instance, and start trying to write that harmonization, you will rapidly discover why for yourself. Even just trying to specify a network setup and putting an instance on the public internet is impossible to harmonize, without making something so lowest-common-denominator it is almost useless, let alone anything complicated.

drewcoo|2 years ago

> he structure and syntax for AWS is entirely different from Azure is entirely different from GCP

This!

We're essentially writing locked-in vendorscript. What we want is an actual infrastructure language. One that lets us write once and deploy anywhere (nods to Java).

That would also allow us to standardize the way additional tooling (monitoring, logging, etc.) hooks into everything. It would allow us to easily deploy to new environment types as they become available (I keep hearing about the wonders of WASM). It would allow standardized ways of doing ops testing.

spc476|2 years ago

You don't want vendor lock-in. The vendor, however, has different incentives---why would they want to prevent lock-in to their service?

firesteelrain|2 years ago

You can do that through abstraction. You “include” your Terraform Azure Provider or Terraform AWS Provider. At the end of the day, your module needs to know what it’s interacting with but not the higher level of abstraction. We have done it at my work to make it cloud agnostic just in case we need to go to another CSP

Niksko|2 years ago

Some of the ideas here remind me of OAM: https://oam.dev/

OAM has a model of components (things like containerized workloads, databases, queues), traits (scaling behavior, ingress) and in the latest draft, policies that apply across the entire application (high availability, security policy).

It's all a little disjointed and seems to have lost steam. KubeVela is powering along, but it's the only implementation, and IMO it's highly opinionated about how you do deploys and works well for Alibaba and perhaps not for others. But it has some interesting ideas.

kristianpaul|2 years ago

The author seems to considers that the only infrastructure available these days is “the cloud”

zokier|2 years ago

I don't see how majority of the stuff wouldn't be applicable for on-prem too?

xorcist|2 years ago

The split between parameterized classes and logic sounds a bit like the split between Puppet and Hiera. The idea was probably a good one, but something about the implementation made people go overboard with it.

I feel IaC really peaked around Puppet 3 and Chef 1. IaC should be simple enough that people use it, and trivial to write providers for. People tend to glue much too large libraries to their IaC platforms and end up with a maintenance mess which is what kills it in the long run. However both the above projects went corporate and grew legs and arms and a billion other features that everybody won't use more than a subset of. Most people migrated to Ansible which kept more of the open source project culture and was simpler in design.

Now people seems to use a little of this, a little of that. Some Ansible, some Terraform, some other stuff. They don't know what they're missing when the entire stack is built ground up from templated components defined in a common declarative language. Some people seem to really like Nix, which I haven't used professionally, but from what I've seen it seems to inherit the same type of design. There was an experimental project called cfg which worked in real time using hooks such as inotify which was promising, if there was a Kubernetes distribution made like that it would be really easy to manage components that didn't belong to a host.

VectorLock|2 years ago

Chef and Puppet are configuration management systems, not really Infrastructure as Code.

onlypositive|2 years ago

What ever happened to saltstack?

cyberax|2 years ago

My problem with the current IAS systems is the state storage. It should not be needed! Instead, the IAS tool should introspect the systems it's managing and build the necessary state on the fly.

JohnMakin|2 years ago

This does not work.

Say I have resource A with property X=1 I define in IAC. Someone comes along and modifies X=2 outside of state. With your way, the IAC tool would see that change and think it was naturally part of the desired state, whereas stored state will catch the drift. And before anyone says “well dont modify outside of IAC” I say 1) that’s often impractical and 2) sometimes automation can modify resources outside of IAC beyond your control.

Also, dynamically creating state creates all sorts of concurrency issues, which is another nice thing about stored state, you can put a lock on it.

wrs|2 years ago

That is how Puppet works. Introspect the current state, compare with the desired state, fix as needed. It mostly works, but in reality it will never reach the point of introspecting literally all of the current state. So there are always ways to subtly break things without the tool noticing. (E.g., a file object that ensures the correct path, contents, ownership, and mode, but doesn’t check xattrs or ACL. [That’s hypothetical, not how the actual Puppet file module works.])

thorgaardian|2 years ago

A lot of people will argue that state helps protect against drift, but the real reason I find that you have to have state is to store values that won't be returned a second time and still construct and connect the graph of resources in the IaC templates. For example, if you declare the need for an RDS database and connect its output credentials into another application, you'll need state in order for the applies to work a second time because you'll never be able to retrieve the values from the target provider again.

beders|2 years ago

Is this a trick to make infrastructure/devops engineers learn TypeScript?

But, hey, when looking at the origins of OOP and its main uses (back then: Simulations and UI): Maybe this is exactly what one needs to describe and setup infrastructure and there have been various projects going in that direction.

Make message passing truly async, throw in Garbage Collection, make it dynamic (i.e. creating a new instance of an object leads to some sort of deployment) and voila: Your traversable, introspect-able object graph is now a representation of your infrastructure.

danielovichdk|2 years ago

I use Azure.

When I need to provision anything I have a powershell script that interacts with Azure CLI.

My script sets up a new resource group for every service we create, logging, key vault, webapp/functions, and if needed some kind of data storage or queuing.

In my powershell script I can via a variable indicate which environment I want to spin up: dev, staging, prod.

I have one yaml file which is for my build and a build trigger which points to the above powershell script with the given environment.

All environments: dev, staging or prod are setup manually with manual user assignments for deployment access etc.

It's really lightweight but I also believe it's lightweight because we run a small services setup where each service takes care of its own provisioning.

Terraform and Yaml are so verbose but that's not the most problematic. You can't execute those files from your local machine.

gnulinux|2 years ago

> Terraform and Yaml are so verbose but that's not the most problematic. You can't execute those files from your local machine.

Have you ever actually used terraform? You execute it from your own local computer, or from CI/CD. It runs in a compute resource you own, not the cloud provider.

jen20|2 years ago

> You can't execute those files from your local machine.

You can execute terraform from local machine just as easily as a powershell script. I dare say you could even make it work a shebang if you wanted (though I’ve never tried that).

flanked-evergl|2 years ago

> When I need to provision anything I have a powershell script that interacts with Azure CLI.

Sounds painful, as you have to make it declarative yourself, while Terraform (which runs perfectly fine on my local machine) is already declarative.

awoimbee|2 years ago

Just use pulumi (self-hosted in blob storage). Using a custom script is not the good choice when you could use the right tool for the job.

firesteelrain|2 years ago

You can execute Terraform from within Azure or a machine external to Azure. Just need a service principal

time0ut|2 years ago

The CSS like IaC language idea was not what I was expecting. It seems ok, but I was hoping for something deeper. What I mean is that I have always felt like there is tension or a mismatch between IaC and the underlying services in a more general sense. I’ve used CDK, Pulumi, Terraform, and CloudFormation and you can argue the merits of each. But they all kind of suck in the sense that you are programming a machine that was not really designed to be programmed. Sure AWS and all the rest have APIs to call and IaC is a decent abstraction over those APIs, but imagine if instead they exposed some lower level interface designed to execute IaC programs natively. I feel like that is the ultimate path to IaC that feels like actual programming.

brodouevencode|2 years ago

> The CSS like IaC language idea was not what I was expecting

I took it as a clumsy (yet somehow descriptive) analogy.

throwa23432|2 years ago

TypeScript will continue to dominate IaC since it has structural typing and set theoretic types and impeccable IDE/LSP support.

I use Terraform HCL 40 hours a week, but it is severely lacking in lang design and type system and IDE/LSP experience.

slotrans|2 years ago

I was with it right up until this statement:

> Centrally updatable: Sometimes best practice or corporate policy changes over time. You can update what LowCost or SecurityPolicy means later on, in one place, and that change will reapply to all resources that used it.

It sounds great but it's not. This is essentially the Fragile Base Class problem. You may _think_ that updating one of these traits in a single place will be safe and do what you want, but it may be disastrous for whoever is using it. And you're not going to find out until you deploy it.

agounaris|2 years ago

I kind of see this as just another yaml abstraction on top of configuration templates. This is not new or different in any way from what we have now.

fulafel|2 years ago

There's only so much you can do by building on the current towering abstractions offered by GCP, AWS, etc. One of the main problems is just the slowness of it all.

Dark is a good example of something that sidesteps this stuff by more fine grained integration of infrastructure and app code.

danw1979|2 years ago

A CSS-like language for IaC is literally the last thing I would have expected someone to suggest.

It’s an interesting idea. My initial reaction was “you can take my HCL from my cold dead hands” but I can’t seriously argue that Terraform is perfect and that I enjoy writing so much boilerplate.

ggeorgovassilis|2 years ago

The author proposes to split structure from parametrisation similarly to how HTML and CSS work.

dmarinus|2 years ago

Interesting comparison with the webbrowser stack! In that sense CDK (generate cloudformation) is more like PHP (generate HTML). I wonder when "Virtual DOM"s, AJAX and CSS preprocessors get introduced to IaC.

iAm25626|2 years ago

Any similar solution for self-hosting focus, my own comprehensive data center infrastructure management + cloud? router/switches, load balancer, firewall, CGN, bare metal server, VM, Containers, Application and etc.

datahead|2 years ago

@cyberax said, "My problem with the current IAS systems is state storage. It should not be needed! Instead, the IAS tool should introspect the systems it's managing and build the necessary state on the fly.

@firesteelrain said, "you can do that through abstraction. You "include" your Terraform Azure Provider or Terraform AWS Provider. At the end of the day, your module needs to know what it’s interacting with but not the higher level of abstraction. We have done it at my work to make it cloud agnostic just in case we need to go to another CSP"

Single ops eng in a 3 person startup here. Ops eng is only one of my hats right now :) I found crossplane to be a solid tool for managing cloud inf. My assertion is that "the only multi-cloud is k8s" and crossplane's solution is "everything is a CRD". They have an extensive abstraction hierarchy over the base providers (GCP, TF, Azure, AWS, etc) so it's feasible to do what firesteelrain did. My client requirements span from- you must deploy into our tenant (could be any provider) to host this for us.

I can setup my particular pile of yaml and say - "deploy a k8s cluster, loadbalancers, ingress, deployments, service accounts (both provider and k8s), managed certs, backend configs, workload identity mgmt, IAP" in one shot. I use kustomize to stitch any new, isolated environment together. So far, it's been a help to have a single API style (k8s, yaml) to interact with and declaratively define everything. ArgoCD manages my deployments and provides great visibility to active yaml state and event logs.

I have not fully tested this across providers yet, but that's what crossplane promises with composite resource definitions, claims and compositions. I'm curious if any other crossplane users have feedback on what to expect when I go to abstract the next cloud provider.

cyberax's note on state management is what led me away from TF. You still have to manage state somewhere, and crossplane's idea was- k8s is already really good at knowing what exists and what should exist. Let k8s do it. I thought that was clever enough to go with it and I haven't been dissapointed so far.

The model extends the k8s ecosystem, and allows you to keep going even into things like db schema mgmt. Check out Atlas k8s operator for schema migrations- testing that next...

I also like that I can start very simple, everything about my app defined in one repo- then as systems scale I can easily pull out things like "networking" or "data pipeline" and have them operating in their own deployment repo. Everything has a common pattern for IAC. Witchcraft.

formulathree|2 years ago

What do you guys think of combining IaC with regular code? One language that does it all?

jerf|2 years ago

The thing that I think this could run up against is that in HTML+CSS it is fairly common to take an element and apply a whole bunch of properties in coordination with each other. That is, I'm going to set similar margins and paddings and fonts and many other properties on each element, and there are a lot of broad similarities. This is where CSS variables come in; even if I'm applying a color to a lot of elements I'm probably pulling from a much smaller palette and if I change one of them I want to change all.

Cloud template definitions also have a lot of settings, but from what I can see, they are all different, all the time, for lots of good reasons. If I'm deploying a lot of different kinds of EC2 instances, I've got a whole bunch of settings that are going to be different for each type. Abstracting is a much different problem as a result. And it isn't just this moment in time, it's the evolution of the system over time, too. In code, overabstracting happens sometimes. In cloud architecture it is an all-the-time thing. It is amazingly easy to over-abstract into "hey this is our all-in-one EC2 template" and then whoops, one day I want to change the instance size for only one of my types of nodes, and now I either need to un-abstract that or add yet another parameter to my all-in-one EC2 template.

The inner platform effect is very easy to stumble into in the infrastructure code as a result, where you have your "all-in-one" template for resource X that, in the end, just ends up offering every single setting the original resource did anyhow.

By contrast, I've pondered the "focus on the links rather than the nodes" idea a few times, and there may be something there. However the big problem I see is that I like rolling up to a resource and having one place where either all the configuration is, or where there is a clear path for me to get to that point. Sticking with an instance just to keep things relatable, if I try to define an instance in terms of its relationship to the network, to the disk system, to the queues that it uses and the lambda it talks to and the autoscaling group it is a part of, now its configuration is distributed everywhere.

One possible solution I've often pondered is modifying the underlying configuration management system to keep track of where things come from, e.g., if you have a string that represents the name of the system you're creating, but it is travelling through 5 distinct modules on its way to the final destination, it would be great if there was a way of looking at the final resource and saying "where exactly did that name come from?" and it would tell you the file name and line number, or the set of such things that went into it. Then at least you could query the state of a resource, and rather than just getting a pile of values, you'd be able to see where they are coming from, dig into all the things that went into all the decisions, that might free you to do link-based configuration rather than node-based configuration. But you'd probably need an interactive explorer; if for instance the various links can configure the size of the underlying disk and you take the max() of the various sizes (or the sum or whatever), you'd need to be able to look at everything that went into the max and all the sources of those values; it's more complicated than just tracking atomic values through the system.

I've often wished for this even in just my small little configs I manage compared to some of you, and it is possible that this would be enough of an advantage to stand out in the crowd right now.

(I think the "track where values came from and how they were used in computation" could be retrofitted onto existing systems. "Focus on links rather than nodes" will require something new; perhaps something that could leverage an existing system but would require a new language at a minimum.)

gdsdfe|2 years ago

you lost me at css and typescript

ChoHag|2 years ago

[deleted]