top | item 45334486

(no title)

pacoWebConsult | 5 months ago

Can YAML go away entirely and instead allow pipelines to be defined with an actual language? What benefits does the runner-interpreted yaml-defined pipeline paradigm actually achieve? Especially with runners that can't be executed and tested locally, working with them is a nightmare.

discuss

order

jayd16|5 months ago

Why do we think an arbitrary language is easier to reason about? If it was so easy you could just do it now. The yaml could be extremely simple and just call into your app, but most don't bother.

I'm certainly willing to believe that yaml is not the ideal answer but unless we're comparing it to a concrete alternative, I feel like this is just a "grass is always greener" type take.

ok123456|5 months ago

You don't have to reason about them.

You write a compiler that enforces stronger invariants above and beyond everything is an array/string/list/number/pointer.

Good general-purpose programming languages provide type systems that do just this. It is criminal that the industry simply ignores this and chooses to use blobs of YAML/JSON/XML with disastrous results---creating ad-hoc programming languages without a typesystem in their chosen poison.

VGHN7XDuOXPAzol|5 months ago

Is it actually possible to just have the YAML that calls into your app today, without losing the granularity or other important features?

I am not sure you can do this whilst having the granular job reporting (i.e. either you need one YAML block per job or you have all your jobs in one single 'status' item?) Is it actually doable?

iLoveOncall|5 months ago

There is a battle tested example of YAML vs programming languages in CloudFormation templates vs CDK.

I don't think anybody serious has any argument in favor of CloudFormation templates.

tracker1|5 months ago

I've done exactly this a few times... ensure my scripting host is present then use scripts for everything. I can use the same scripts locally without issue and they work the same on self-hosted runners.

Note: mostly using Deno these days for this, though I will use .net/grate for db projects.

esafak|5 months ago

> If it was so easy you could just do it now.

Some do just that: dagger.io. It is not all roses but debugging is certainly easier.

biimugan|5 months ago

I agree somewhat with the proposition that YAML is annoying for configuring something like a workflow engine (CI systems) or Kubernetes. But having it defined in YAML is actually preferable in an enterprise context. It makes it trivial to run something like OPA policy against the configuration so that enterprise standards and governance can be enforced.

When something is written in a real programming language (that doesn't just compile down to YAML or some other data format), this becomes much more challenging. What should you do in that case? Attempt to parse the configuration into an AST and operate over the AST? But in many programming languages, the AST can become arbitrarily complex. Behavior can be implemented in such a way as to make it difficult to discover or introspect.

Of course, YAML can also become difficult to parse too. If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals. But in principal, that's still at least more tractable than trying to parse an AST.

catlifeonmars|5 months ago

> If the system consuming the YAML supports in-band signalling -- i.e. proprietary non-YAML directives -- then you would need to first normalize the YAML using that system to interpret and expand those signals.

cough CloudFormation cough

Charon77|5 months ago

Just run the code and see the output.

There are multiple ways to safely run untrusted code.

I for one enjoy how build.rs in rust does it: you have a rust code that controls the entire build system by just printing stuffs on stdout.

There are other ways of course

0xbadcafebee|5 months ago

A custom language in GHA would be worse. You'd be limited by whatever language they supported, and any problems with it would have to go through their support team. It adds more burden on GHA (they spending more time/money on support) without creating value (new features you want).

You already don't have to use YAML. Use whatever language you want to define the configuration, and then dump it as YAML. By using your own language and outputting YAML, you get to implement any solution you want, and GitHub gets to spend more cycles building features.

Simple example:

  1. Create a couple inherited Python classes
  2. Write class functions to enable/disable GHA features and validate them
  3. Have the functions store data in the class object
  4. Use a library to output the class as YAML
  5. Now craft your GHA config by simply calling a Python object
  6. Run code, save output file, apply to your repo
I don't know why nobody has made this yet, but it wouldn't be hard. Read GHA docs, write Python classes to match, output as YAML.

If you want more than GHA features support [via configuration], use the GHA API (https://docs.github.com/en/rest/actions) or scripted workflows feature (https://github.com/actions/github-script).

doublet00th|5 months ago

This is what we built at my last startup Pangea (except with GitLab CI/CD). It worked pretty well and I always wanted to open-source it, but alas in a startup there's never enough time.

tarkaTheRotter|5 months ago

Hey. I'm currently making Typeflows to solve this (amongst) another few pain points, and am planning to make it available in JVM (this exists now)/TS and Python at least.

There are existing solutions around, but do miss out a bunch of things that are blatantly missing in the space:

- workflow visualisations (this is already working - you can see an example of workflow relationship and breakdowns on a non-trivial example at https://github.com/http4k/http4k/tree/master/.github/typeflo...);

- running workflows through an event simulator so you can tell cause and effect when it comes to what triggers what. Testing workflows anyone? :)

- security testing on workflows - to avoid the many footguns that there are in GHA around secrets etc;

- compliance tests around permitted Action versions;

- publishing of reusable repository files as binary dependencies that can be upgraded and compiled into your projects - including not just GHA actions and workflows but also things like version files, composable Copilot/Claude/Cursor instruction files;

- GitLab, CircleCI, Bitbucket, Azure DevOps support using the same approach and in multiple languages;

Early days yet, but am planning to make it free for OSS and paid for commercial users. I'm also dogfooding it on one of my other open source projects so to make sure that it can handle non-trivial cases. Lots to do - and hopefully it will be valuable enough for commercial companies to pay for!

Wish me luck!

https://typeflows.io/

rickette|5 months ago

GitHub Actions originally supported HCL (Hashicorp Configuration Language) instead of YAML. But the YAML force was too strong: https://github.blog/changelog/2019-09-17-github-actions-will....

nothrabannosir|5 months ago

HCL is same s**, different smell. Equally hamstrung. It’s the reason hashicorp came out with an actually programmable version of the hcl semantics: CDKTF.

freeplay|5 months ago

If you have worked with HCL in any serious capacity, you'll be happy they didn't go that route.

Here's some fun examples to see why HCL sucks:

- Create an if/elseif/else statement

- Do anything remotely complex with a for loop (tip: you're probably going to have to use `flatten` a lot)

imiric|5 months ago

Agreed. YAML is not a great format to begin with, but using it for anything slightly more sophisticated (looking at you Ansible, k8s, etc.) is an exercise in frustration.

I really enjoyed working with the Earthfile format[1] used for Earthly CI, which unfortunately seems like a dead end now. It's a mix of Dockerfile and Makefile, which made it made very familiar to read and write. Best of all, it allowed running the pipeline locally exactly as it would run remotely, which made development and troubleshooting so much easier. The fact GH Actions doesn't have something equivalent is awful UX[2].

Honestly, I wish the industry hadn't settled on GitHub and GH Actions. We need better tooling and better stewards of open source than a giant corporation who has historically been hostile to open source.

[1]: https://earthly.dev/earthfile

[2]: Yes, I'm aware of `act`, but I've had nothing but issues with it.

Pxtl|5 months ago

Yes. Most of my custom pipeline stuff is a thin wrapper around a normal-ass scripting-language because the yaml/macro stuff is so hard to check and debug.

jbjbjbjb|5 months ago

I couldn’t agree more. I think we should just write our pipelines in languages our teams are familiar with and prioritise being able to run them locally.

delusional|5 months ago

> prioritise being able to run them locally.

That is the key function any serious CI platform needs to tackle to get me interested. FORCE me to write something that can run locally. I'll accept using containers, or maybe even VMs, but make sure that whatever I build for your server ALSO runs on my machine.

I absolutely detest working on GitHub Actions because all too often it ends up requiring that I create a new repo where I can commit to master (because for some reason everybody loves writing actions that only work on master). Which means I have to move all the fucking secrets too.

Solve that for me PLEASE. Don't give me more YAML features.

bigstrat2003|5 months ago

I agree. I like YAML for a lot of things, but this is very much not one of them. CI pipelines are sufficiently complex that you will very quickly exceed the capabilities of "it's just a simple plain text markup". You need a real programming language.

soraminazuki|5 months ago

A JSON-like language with functions is the answer here. When it comes to describing large, complex, and sometimes repetitive data, having a declarative language with proper tools for abstraction helps so much with readability, writability, and maintainability.

I've seen few thousands-line YAML files with anchors riddled all over the place. It was impossible to deal with. Rewriting it in Jsonnet paid off immediately.

Another example is Nixpkgs. It's quite pleasant to deal with despite the size of its codebase.

ericHosick|5 months ago

Yes! Hopefully a language that supports code as data (homoiconicity).

rurban|5 months ago

There is yamlscript for you now :) No security of course.

Jokes aside, I like proper yaml anchors. Other CI's do support these and it made writing yaml actions much easier, esp. complicated cross-building recipes with containers and qemu.

zft|5 months ago

jenkins supports groovy dsl jobs. I would not say using it made anything easier

oblio|5 months ago

Well, Groovy is a bit of a basket case programming language, so that doesn't help.

I say this as someone that built entire Jenkins Groovy frameworks for automating large Jenkins setups (think hundreds of nodes, thousands of Jenkins jobs, stuff like that).

ZYbCRq22HbJ2y7|5 months ago

You could make a builder to do this for you. It could build your actions in a pre-commit hook or whatever.

Although, I think it is generally an accepted practice to use declarative configuration over imperative configuration? In part, maybe what the article is getting at, maybe?

baq|5 months ago

YAML is neither declarative nor imperative. It's just a tree (or graph, with references) serialization to text.

wiether|5 months ago

Basically what we ended up doing at work is creating some kind of YAML generator.

We write Bash or Python, and our tool will produce the YAML pipeline reflecting it.

So we dont need to maintain YAML with over-complicated format.

The resulting YAML is not meant to be read by an actual human since its absolute garbage, but the code we want to run is running when we want, without having to maintain the YAML.

And we can easily test it locally.

easterncalculus|5 months ago

I work on a monorepo that does this using Typescript, for type checking. It's a mess. Huge learning curve for some type checking that very often will build perfectly fine but fail a type-check in CI.

Honestly, just having a linter should be enough. Ideally, anything complicated in your build should just be put into a script anyways - it minimizes the amount of lines in that massive YAML file and the potential for merge conflicts when making small changes.

mhh__|5 months ago

I'm not convinced there should be anything to define at all versus basically just some extremely broad but bare platform and a slot to stick an executable in.

AaronAPU|5 months ago

I generate all my GH YAML files via Python. The thought of writing them by hand makes me want to vomit, one of the best design choices I ever made.

verdverm|5 months ago

I use CUE and generate the yaml, don't care what a giant unreadable slop it is anymore

I use CUE to read yamhell too

giancarlostoro|5 months ago

Wouldn't Terraform solve this? You can have all your infrastructure as code in a git repo.

red_hare|5 months ago

I'm surprised by this take. I love YAML for this use case. Easy to write and read by hand, while also being easy to write and read with code in just about every language.

baq|5 months ago

YAML is a serialization format. I like YAML as much as I like base64, that is I don't care about it unless you make me write it by hand, then I care very much.

GitHub Actions have a lot of rules, logic and multiple sublanguages in lots of places (e.g. conditions, shell scripts, etc.) YAML is completely superficial, XML would be an improvement due to less whitespace sensitivity alone.

pacoWebConsult|5 months ago

Sure, easy to read, but quite difficult to /reason/ about in your head, let alone have proper language server/compiler support given the abstraction over provider events and runner state. I have never written a CI pipeline correctly without multiple iterations of pushing updates to the pipeline definition, and I don't think I'm alone on that.

shadowgovt|5 months ago

Easy to write and read until it gets about a page or two long. Then you have to figure out stuff like "Oh gee, I'm no nesting layer 18, so that's... The object.... That is.... The array of.... The objects of....."

Plus it has exactly enough convenience-feature-related sharp edges to be risky to hand to a newbie, while wearing the dress of something that should be too bog-simple to have that problem. I, too, enjoy languages that arbitrarily decide the Norwegian TLD is actually a Boolean "false."

Pxtl|5 months ago

It's less about YAML itself than the MS yaml-based API for interacting with build-servers. It's just so hard to check and test and debug.

TheDong|5 months ago

> Easy to write and read by hand, while also being easy to write and read with code in just about every language

Language implementations for yaml vary _wildly_.

What does the following parse as:

    some_map:
      key: value
      no: cap
If I google "yaml online" and paste it in, one gives me:

{'some_map': {False: 'cap', 'key': 'value'}}

The other gives me:

{'some_map': {'false': 'cap', 'key': 'value'}}

... and neither gives what a human probably intended, huh?