(no title)
criticas | 3 years ago
It doesn't sound like a scheduling problem - it sounds like a noticing problem. You have to figure out what to do on failure - email, text, retry, log, etc. (Hence the suggestion for Kubernetes, or another declarative automation system like Ansible or Puppet). If "daemon X should be running", checking for it and sending an email is the easiest and most useless response.
No comments yet.