Ask HN: What is the correct way to deal with pipelines?
1 points| vsroy | 2 years ago
This is not a massive data-engineering project. I just need to ensure the steps get run to completion.
My current plan to do this is to just use Redis lists as queues, and then have steps go from one queue to the next, but I'm wondering if there's a better way.
lantry|2 years ago
- https://camel.apache.org/
- https://www.windmill.dev/
- https://github.com/huginn/huginn
Your idea about a queue (in redis, or postgres, or sqlite, etc) is also totally valid. These off-the-shelf tools I listed probably wouldn't give you a huge advantage IMO.
idorosen|2 years ago
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
I like Airflow because you can give access to the web UI to operators and they can kick/run/stop tasks or graphs of tasks. Both Airflow and Luigi expect you to express your workflow as a DAG in Python code.
vsroy|2 years ago
dylanhassinger|2 years ago