top | item 46731168

(no title)

grugdev42 | 1 month ago

It sounds like you have four separate problems:

---

1. Being sure your cronjobs ran

Use a heartbeat monitoring system like this:

https://uptimerobot.com/cron-job-monitoring/

Append their URL to ping after your cronjob. Like so:

* * * * * * python /home/foo.py && curl https://example.com/heartbeat

If your cronjob doesn't run, or runs and fails, the heartbeat won't get called because of the &&.

---

2. Make sure your scripts return error codes:

Your scripts should return 0 for success, or greater than 0 for errors.

This ties into point number one. Without proper error codes your heartbeat monitoring won't work.

---

3. Standardised logging:

Make sure you send/pipe your errors to ONE place. Having to look in multiple places is just asking for trouble.

And then check your logs daily. Better yet, automate the checking... maybe you send the contents of the log to Slack once per day? Or email it to yourself?

---

4. More robust scripts:

I'm not trying to be unkind, but your scripts sound like they're erroring a lot!

Maybe they need to be tightened up... don't blindly trust things, check return types, verify the previous step using code, log more information to help you track the problems down

---

If you do all of these things I think you will fix your problems. Good luck :)

discuss

No comments yet.