(no title)
m104 | 1 year ago
1. Rate limits don't really protect against backend capacity issues, especially if they are statically configured. Consider rate limits to be "policy" limits, meaning the policy of usage will be enforced, rather than protection against overuse of limited backend resources.
2. If the goal is to protect against bad traffic, consider additional steps besides simple rate limits. It may make sense to perform some sort of traffic prioritization based on authentication status, user/session priority, customer priority, etc. This comes in handy if you have a bad actor!
3. Be prepared for what to communicate or what action(s) to perform if and when the rate limits are hit, particularly from valuable customers or internal teams. Rate limits that will be lifted when someone complains might as well be advisory-only and not actually return a 429.
4. If you need to protect against concertina effects (all fixed windows, or many sliding windows expiring at the same time), add a deterministic offset to each user/session window so that no large group of rate limits can expire at the same time.
Hope that helps someone!
dskrvk|1 year ago
Did you mean non-deterministic (like jitter)?
refibrillator|1 year ago
Long ago I was responsible for implementing a “rate limiting algorithm”, but not for HTTP requests. It was for an ML pipeline, with human technicians in a lab preparing reports for doctors and in dire cases calling their phone direct. Well my algorithm worked great, it reduced a lot of redundant work while preserving sensitivity to critical events. Except, some of the most common and benign events had a rate limit of 1 per day.
So every midnight UTC, the rate limit quotas for all patients would “reset” as the time stamp rolled over. Suddenly the humans in the lab would be overwhelmed with a large amount of work in a very short time. But by the end of the shift, there would be hardly anything left to do.
Fortunately it was trivial to add a random but deterministic per patient offset (I hashed the patient id into a numeric offset).
That smoothly distributed the work throughout the day, to the relief of quite a few folks.
solatic|1 year ago
Yes and no, there's a little more nuance here. You're correct that the business signing up X new users, each with new rate ~limits~ allocations, does not in and of itself scale up your backend resources, i.e. it's not naively going to vertically scale a Postgres database you rely on. But having a hardware rate limiter in front is like setting the value on the "max" setting on your autoscaler - it prevents autoscaling cost from skyrocketing out of control when the source of the traffic is malicious/result of a bug/"bad"; instead a human is put in the loop to guage that the traffic is "good" and therefore the rate limit should be increased.
> Rate limits that will be lifted when someone complains might as well be advisory-only and not actually return a 429
How does one set an "advisory-only" rate limit that's not a 429? You can still return a body with a 429 with directions on how to ask for a rate limit increase. I don't think of 4xx as meaning that the URL will never return something other than a 4xx, rather that the URL will continue to return 4xx without human intervention. For example, if you're going to write blog.example.com/new-blog-entry, before you publish it, it's a 404, then after the blog post is published, it will return a 200 instead.
abrahms|1 year ago
IgorPartola|1 year ago
m104|1 year ago
The easiest way to explain this is with a simple sequence of events: the database has a temporary issue, system capacity drops, clients start timing out or getting errors, load amplification kicks in with retries and request queueing, load is now higher than normal while capacity is lower than normal, devs work hard to get the database back in order, database looks restored but now the system has 3x the load it did before the incident, other heroic efforts are needed to shed load and/or upscale capacity, whew it's all working again! In the post-mortem there are lots of questions about why rate-limiting didn't protect the system. Unfortunately, the rate limit values required to restore the saturated system are far too low for normal usage, and the values needed for normal operation are too high to prevent the system from getting saturated.
Fundamentally, there's really no way for a rate limiting (which only understands incoming load) to balance the equation load <= capacity. For that, we need a back-pressure mechanism like circuit breaking, concurrency limiting, or adaptive request queueing. Fortunately, rate limiting and back-pressure work well together and don't have to know about each other.
trevor-e|1 year ago
Like the OP said, this doesn't protect you from going over your system capacity, you can still have 10 million orgs all requesting 10 req/s which can take down your system while abiding by your rate limits.
throwaway63467|1 year ago
jgalt212|1 year ago
That's our primary use case, so I am also curious to hear more.
foota|1 year ago
Ideally, you can provide isolation between users on the same "tier" so that no one user can crowd out others.