(no title)
batbomb | 6 years ago
Nextflow was mentioned. I think what most people want is probably closer to Airflow, although it takes some time getting it up to production in a cloud environment (there is astronomer.io and a GCP product).
HTCondor via DAGMan has existed a long time, and there’s even engines built on that (Pegasus, Wings).
There’s Swift (http://swift-lang.org/main/) and it’s successor Parsl. Cray has Chapel. These are a bit different, in that they are more like a distributed computer program. Of course, so is Julia, but built into these languages is the assumption you can be using unreliable, in some way, computing. Makeflow and GNU Parallel are closer to this category too.
Then there’s Beam, but that’s dataflow.
The crappy thing about this is it’s hard to understand when to use a solution and when to not use a solution. Why are there so many solutions? Because there’s a ton of different needs, and a lot of these focus on a few in particular:
Latency
Scalability or workers
Dynamic Scalability of workers
Throughput
Polyglot
Integration with existing Schedulers
Workflow Code Management (container support)
Maintainability of very large DAGs
Testability of DAGs/Development support
Execution Management support/Web APIs
Error recovery (especially for long running workflows)
Re-execution capabilities
Provenance tracking
Domain Specificity
Data Management (next to data processing)
... the list goes on.
djtriptych|6 years ago
It's amazing it works at all in my opinion.
This file [0] contains much of the complexity as a messy, stateful, monolithic block of Python. Having had to chase down deep bugs / limitations in this software, I'm now convinced that Python, with it's GIL, weak typing, lack of concurrency primitives, and generally OOP / imperative style is just the wrong tool for the job.
[0]: https://github.com/apache/airflow/blob/master/airflow/jobs/s...
j88439h84|6 years ago
https://trio.readthedocs.io is an extremely good python concurrency library based on the model of Structured Concurrency (https://vorpus.org/blog/notes-on-structured-concurrency-or-g...).
The typing issues are far improved in current Python with annotations and attrs/dataclasses.
thesorrow|6 years ago
kinow|6 years ago
Indeed! I am working on Cylc [1] right now, which is a cyclic workflow system, where users need more than DAG.
It was created to automate weather forecast operations, but now there are a few cases of users trying to use it for cyclic graphs for more general problems.
[1] https://github.com/cylc/cylc-flow