The Thundering Herd Problem

[+] cortesoft|2 years ago|reply

As someone who has worked on very large networks for 15 years, the author left out some important bits about using caching with thundering herd problems.

You really need to make sure that your cache settings will not generate a ton of requests to the origin when new content is requested.

If you have a bunch of clients that are going to request content at nearly the exact same time (this happens a lot with live video streaming, but can also happen when your clients are configured to download something as soon as it becomes available, or on a schedule, etc), you need to make sure your caches are set up so that the second request will wait for the first request to fill from origin rather than send its own cache fill request. Ideally, if 1000 clients all request the same content from your cache, only ONE request should go back to the origin, and the other 999 should wait and used the content retrieved by the first request. Otherwise, your cache isn't going to do anything since every request is going to require pulling from origin anyway.

In nginx, you do this with something like the proxy_cache_lock setting.

[+] bitexploder|2 years ago|reply

Caching is generally way harder to get right than a lot of people probably think. Having delved into building caching systems that get the behavior and needs of the system just right, it took way more effort and research than I thought it would. You start with something simple and by the time you have chased down all the edge cases and tuned the cache behavior to your situation it is often not so simple after all.

[+] omeze|2 years ago|reply

Yep, Ive found that, generally, caching logic should be a separate flow from business logic. E.g wrap your RPC logic such that an in-memory cache transparently sits in your RPC stack. This lets you implement locking on the cache in a single place, and makes it easy to do things like eager prefetching or offloading cache logic to another thread.

[+] 01HNNWZ0MV43FF|2 years ago|reply

I recall seeing that called "Request coalescing" in Faster Than Lime's article https://fasterthanli.me/articles/request-coalescing-in-async...

[+] rollcat|2 years ago|reply

Adding a little bit of random delay to any scheduled / triggered actions is always a good idea, and is usually very cheap to implement.

If you're on OpenBSD, their cron supports randomized ranges using the ~ operator (see <https://man.openbsd.org/crontab.5>). Otherwise, you can use something like 'sleep $(($RANDOM % 60)) && some-task', but beware that $RANDOM has a range of 0-65535; you won't get the full, uniform 86400s range for daily jobs.

If you are unable to add a different random delay on every invocation (e.g. you have an RSS reader that only allows you to specify an exact interval), pick some nearby prime number, e.g. 37 instead of 30 or 45, or 71 instead of 60 (71-60=11 being a prime is also great).

If you're deploying a whole fleet at once, you can also vary the exact timings of cron.hourly, cron.daily, etc between machines.

[+] traceroute66|2 years ago|reply

For those who don't have the weird systemd allergy, systemd has many great features, including random scheduling, see `RandomizedDelaySec`.

[+] PaulDavisThe1st|2 years ago|reply

Once again, what TFA describes is NOT the thundering herd problem.

https://en.wikipedia.org/wiki/Thundering_herd_problem

> In computer science, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event.

(emphasis mine)

[+] gilbetron|2 years ago|reply

Thundering herd has evolved beyond the original definition, that happens. Sometimes terms even flip their meaning (like how "one bad apple" has flipped it's meaning, or how "pre-optimization is the root of all evil" as used these days isn't what the original author actually meant, but in this case I think most of the new usages fit the basic idea, which is a large amount of unusual traffic happens outside of normal usage patterns.

[+] Rapzid|2 years ago|reply

Okay but that's actually not the most general definition I'm familiar with.

It's a synchronization and resource constraint issue with various symptoms depending on domain specifics.

Let's discuss specifics about how the post isn't discussing a thundering heard problem in the general sense..

[+] matsemann|2 years ago|reply

Once had to interface with an external API for many of our client requests (booking requests for transportation). Due to certain combinations of queries, it could sometimes be slow to get a response if the requested journey would contain multiple legs and combinations. So we needed to have a high timeout for our requests to that system. Which was fine most of the time.

Unfortunately, if that system started to struggle, all our calls would be slow, or even time out. And of course us continuously spamming them wouldn't help (we were like 95% of their traffic). But that didn't just wreak havoc on their side, but it completely exhausted all our incoming connections, thread pools etc. just waiting for responses that would never come, with long timeouts.

A circuit breaker here was golden. When too many requests in a row were slow, we would break and stop requests from our side. That would allow them to recover on their side, and things on our side to fail fast and let other services still work.

A key here, was that the circuit breaker would allow a small percentage through still. How else would you detect things are back up again? And then slowly let more and more through if things are going fine (so not just open completely all at once).

[+] cbsmith|2 years ago|reply

The thundering herd problem comes from throughput decreasing due to overhead when you need to scale up capacity (traditionally due to increased context switching, but these days often from autoscaling activity). While rate-limiting, circuit-breaking, load-shedding, etc. are decent band aids for the problem, the better solution tends to be finding ways to increase amortization of request processing overhead as request load increases. Caching is a way to do it, but if done improperly can exacerbate problems. Another common approach, that I didn't see mentioned here in the article, is microbatching, where your pay your overhead costs on a per batch basis (often using streaming interfaces like Kafka). As you fall behind, your batch sizes increase, which increases the number of requests you process per batch, spreading the batch processing overhead over more and more requests.

[+] pphysch|2 years ago|reply

The single biggest way to mitigate this is to switch from "push" to "pull" architecture where possible. Let the central servers reach out to the "clients" at their own honest pace. Not every process can be modeled this way, but a surprising amount can, particularly internal processes where you control all of the agents.

[+] cbsmith|2 years ago|reply

Ironically, stream processing, which we tend to call "push" architecture, is largely the same solution.

[+] Berniek|2 years ago|reply

Philosophically, the "thundering herd problem" actually CAUSES congestion.

There are 2 ways to handle it.

1 Stop the thundering herd.(make all the clients do something different). That may make things worse. Congestion in networks is usually exponential. You can't fulfill a request so you repeat the request, it can't be fulfilled so it repeats. You can add a random delay at the client end but that is just The US governments answer to the debt problem, it kicks the can down the road in the short term but it will almost certainly come back and bite you. Mathematically it is very easy for this scenario to become catastrophically exponential when a threshold is reach

2 Stop the congestion (make the handling process faster or add processes)

The system already has a cache to handle this but if its not in the cache it doesn't help. There needs to be an extra request cache exclusively for congestion scenarios. The existing cache request is already doing some processing, so extend that to route "thundering herd requests processing". This second cache does a bit more processing as well.As each new request is routed to it, it checks itself to see if this requestor is in the cache and removes it or overwrites it. It should never contain more than one entry per client.

When no more editions are made to this congestion cache (or the rate has slowed significantly) then the requests can be forwarded and processed via the original cache system.

Under this configuration, the congestion does not become exponential and should only delay the thundering herd requests. All other requests will be handled as per normal.

Once the original cache has the information there is no need for any thundering herd requests to be routed to the congestion cache.

Some clients will encounter some delays but not all and only on the "thundering herd process".

[+] _whiteCaps_|2 years ago|reply

When working on an embedded product, we used the last few digits of the MAC address to control timing of API accesses. Obviously doesn't work unless you control the whole stack, but for us, it was an easy fix since we had our own OUI.

[+] btbuildem|2 years ago|reply

Wouldn't the obvious precaution here be to roll out the new feature to just a portion of the waitlist?

I guess the risk here is that you'd never hit the same thresholds as when letting it all rip at once, but wouldn't it be a bit more graceful and let you work out at least some of the problems in front of a smaller audience?

[+] muffa|2 years ago|reply

Props to encore to writing interesting blogs, also I just want to say I tried the encore product and it's awesome(I only used it for a hobby project but still)

[+] jodydonetti|2 years ago|reply

Shameless plug: for protection against thundering herd, cache stampede and various transient failures on .net I created FusionCache.

Hope this may be of help to someone.

https://github.com/ZiggyCreatures/FusionCache

[+] AlbertCory|2 years ago|reply

> user journeys

meaning, they tested that if the user does everything the way they're "supposed to" it works.

That explains a lot.

[+] bluelightning2k|2 years ago|reply

Hey - just wanted to give props to the author. I always enjoy Encore content when it pops up here.

Brand is cool too

[+] bluelightning2k|2 years ago|reply

Although I just checked your pricing on a whim and it's the most expensive per member service I've ever seen for devs. By a LOT.

I mean I still like the brand but wow.

[+] he0001|2 years ago|reply

My experience is that too many doesn’t even know what this problem actually is, and yet it exists in many places, but in different shapes.

37 comments