top | item 42127355

(no title)

posix_compliant | 1 year ago

What's neat is that this is a differential equation. If you kill 5% of instances each hour, the reduction in bad instances is proportional to the current number of instances.

i.e.

if bad(t) = fraction of bad instances at time t

and

bad(0) = 0

then

d(bad(t))/dt = -0.05 * bad(t) + 0.01 * (1 - bad(t))

so

bad(t) = 0.166667 - 0.166667 e^(-0.06 t)

Which looks a mighty lot like the graph of bad instances in the blog post.

discuss

order

uvdn7|1 year ago

Love it! I wonder if the team knew this explicitly or intuitively when they deployed the strategy.

> We created a rule in our central monitoring and alerting system to randomly kill a few instances every 15 minutes. Every killed instance would be replaced with a healthy, fresh one.

It doesn't look like they worked out the numbers ahead of the time.