top | item 37620733

(no title)

Vito is at dagger.io now so hopefully we can expect some good stuff in the CI space there.

discuss

solatic|2 years ago

Sadly, Dagger doesn't get it either. It's so focused on portability between the underlying infrastructure providers, on not being the underlying infrastructure provider, and therefore it doesn't solve the real problem, which is the underlying infrastructure provider.

(a) Consider Dagger's integration with GitHub Actions: https://docs.dagger.io/cookbook#github-actions where you anyway need to run setup-node, npm ci, etc. just to start the Dagger pipeline. So Dagger isn't saving you from the having to deal with GitHub Actions' caching layer and always-blank starting point - it's unavoidable. Well, if I can't avoid it, why should I use Dagger in the first place - why not embrace it?

(b) Consider a usecase where I want to parallelize the computation onto a number of machines dynamically chosen at run-time. Maybe I want to allow a test suite to run on an increasing number of machines without needing to periodically manually increase the number of machines in a configuration file, or maybe I'm using Terraform workspaces where I want to run terraform apply for each workspace on a different VM to let the number of workspaces scale horizontally. This is fundamentally impossible with something like Dagger (also impossible in GitHub Actions) because it would require Dagger to communicate with the infrastructure provider to tell it to scale up compute to handle the parallel jobs, and then scale down once those jobs finish.

This was achievable with Concourse by having Concourse pipelines generate other Concourse pipelines, and running the underlying Concourse workers as an autoscaling Kubernetes statefulset/deployment, combined with other Kubernetes implements like cluster autoscaler.

shykes|2 years ago

Hi! Dagger co-founder here. I thought I’d share a few clarifying points - and acknowledge that we should explain certain aspects of Dagger’s design better, to avoid having to clarify in the first place!

> you anyway need to run setup-node, npm ci, etc. just to start the Dagger pipeline

You do need to write CI configuration to run a dagger pipeline, but it’s a very small and standardized snippet (“install dagger, the run this command”) and it typically replaces a much larger custom mess of yaml and shell scripts.

The main benefit though is that your pipeline logic is decoupled from CI altogether. You can run the same Dagger pipeline post-push (in CI) but also pre-push. Similar to running a makefile or shell script, except it’s real code.

> * Dagger isn't saving you from the having to deal with GitHub Actions' caching layer and always-blank starting point*

Dagger most definitely does do that :) We use Dagger and Github Actions ourselves, and have completely stopped using GHA’s caching system. Why bother, when Dagger caches everything automatically?

> Well, if I can't avoid it, why should I use Dagger in the first place - why not embrace it?

I think that’s your Stockholm syndrome talking. The terrible experience that is CI - the “push and pray” development loop; the drift between post-push yaml and pre-push scripts; the primitive caching system; the lack of good composition system; the total impossibility of testing your pipelines - that pain is avoidable, and you shouldn’t have to embrace it. You deserve better!

> This is fundamentally impossible with something like Dagger (also impossible in GitHub Actions) because it would require Dagger to communicate with the infrastructure provider to tell it to scale up compute to handle the parallel jobs, and then scale down once those jobs finish.

This is possible with Dagger. It certainly shouldn’t be a core feature of the engine, but the beauty of a programmable system is that you can build infinite capabilities on top of it. You do need a truly programmable system though, which Github Actions is not.

> This was achievable with Concourse by having Concourse pipelines generate other Concourse pipelines

Dagger pipelines can dynamically run new pipelines, at arbitrary depth. In other words nodes in the DAG can add more nodes at runtime.

Give vito some credit, he is an incredibly talented engineer who built a product that you love. Maybe he saw in Dagger the potential to build something that you will love too :) He blogged about his thought process here: https://dev.to/vito/why-i-joined-dagger-43gb

I will concede that Dagger’s clustering capabilities are not great yet. Which is why piggyback on CI infrastructure for that part… for now!