top | item 39212089

(no title)

brown9-2 | 2 years ago

> we implemented logic to prevent the node from going down at the top of a minute when possible since — given the nature of cron — that is when it is likely that scripts will need to be scheduled to run

Why not smear the start time of the jobs across seconds of that minute to avoid any thundering herd problems? How much functionality relies on a script being invoked at exactly the :00 mark? And if the functionality depends on that exact timing, doesn’t it suggest something is fragile and could be redesigned to be more resilient?

discuss

order

wutwutwat|2 years ago

At their scale, staggering script start times over a 60 second window likely wouldn’t have much of an impact if they are experiencing a thundering herd, imo. If it did help, it would be a bandaid and ticking time bomb before someone has to actually solve the load problem that staggering start times kicked down the road