To mitigate this case you could limit capacity in terms of concurrency instead of request rate. Basically it would be like a fairly-acquired semaphore.
I believe nginx+ has a feature that does max-conns by IP address. It’s a similar solution to what you describe. Of course that falls down wrt fairness when fanout causes the cost of a request to not be proportional to the response time.
hinkley|1 year ago