We use Apache Oozie (http://oozie.apache.org/) an orchestration system for Hadoop. We don't run days-long workflows, but we run some that have over a dozen steps, and I have no reason to believe Oozie couldn't handle longer-running workflows. Oozie has facilities for handling retries based on user-defined behaviors, and because it can run shell scrips, Java apps, Spark jobs, and most anything in the Hadoop ecosystem, I've found it to be pretty easy to integrate with our other tooling. My one complaint (and it's more a complaint with YARN) is that it can be quite difficult to get your hands on logs when your workflows fail. You can get them, but it can be a real pain.We were running Oozie on Cloudera, but are migrating to AWS, and I was pleased to find that it can be installed on an EMR cluster[1] and managed with Hue[2] which has a decent UI to administer the schedule with, and a visualization depicting the workflow DAG.
[1]: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-conf...
[1]: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-oozi...
[2]: http://gethue.com/tutorial-a-new-ui-for-oozie/
No comments yet.