top | item 20411446

(no title)

batbomb | 6 years ago

A workflow language is only as good as it’s engine.

Nextflow was mentioned. I think what most people want is probably closer to Airflow, although it takes some time getting it up to production in a cloud environment (there is astronomer.io and a GCP product).

HTCondor via DAGMan has existed a long time, and there’s even engines built on that (Pegasus, Wings).

There’s Swift (http://swift-lang.org/main/) and it’s successor Parsl. Cray has Chapel. These are a bit different, in that they are more like a distributed computer program. Of course, so is Julia, but built into these languages is the assumption you can be using unreliable, in some way, computing. Makeflow and GNU Parallel are closer to this category too.

Then there’s Beam, but that’s dataflow.

The crappy thing about this is it’s hard to understand when to use a solution and when to not use a solution. Why are there so many solutions? Because there’s a ton of different needs, and a lot of these focus on a few in particular:

Latency

Scalability or workers

Dynamic Scalability of workers

Throughput

Polyglot

Integration with existing Schedulers

Workflow Code Management (container support)

Maintainability of very large DAGs

Testability of DAGs/Development support

Execution Management support/Web APIs

Error recovery (especially for long running workflows)

Re-execution capabilities

Provenance tracking

Domain Specificity

Data Management (next to data processing)

... the list goes on.

discuss

order

djtriptych|6 years ago

Just regarding Airflow: unless Google has done a lot of work upgrading the internals since embracing Airflow as a supported cloud provider, I would think twice about using it.

It's amazing it works at all in my opinion.

This file [0] contains much of the complexity as a messy, stateful, monolithic block of Python. Having had to chase down deep bugs / limitations in this software, I'm now convinced that Python, with it's GIL, weak typing, lack of concurrency primitives, and generally OOP / imperative style is just the wrong tool for the job.

[0]: https://github.com/apache/airflow/blob/master/airflow/jobs/s...

thesorrow|6 years ago

I'm using Airflow for a lot of critical tasks and it works really well. But I agree that Python may not be the best language to implement a workflow engine.

kinow|6 years ago

>Because there’s a ton of different needs, and a lot of these focus on a few in particular:

Indeed! I am working on Cylc [1] right now, which is a cyclic workflow system, where users need more than DAG.

It was created to automate weather forecast operations, but now there are a few cases of users trying to use it for cyclic graphs for more general problems.

[1] https://github.com/cylc/cylc-flow