top | item 34189602

(no title)

criticas | 3 years ago

If the batch jobs have few or simple interdependencies, scheduling is the easy part. Are your tasks too complex for cron/at/batch? For example, do they require coordination across machines? That might suggest looking at slurm/lsf or another distributed job scheduler or implementing them on Kubernetes. Sounds like that would be overkill in your case.

It doesn't sound like a scheduling problem - it sounds like a noticing problem. You have to figure out what to do on failure - email, text, retry, log, etc. (Hence the suggestion for Kubernetes, or another declarative automation system like Ansible or Puppet). If "daemon X should be running", checking for it and sending an email is the easiest and most useless response.

discuss

No comments yet.