(no title)
tanveergill | 2 years ago
I think it is beneficial to re-think how we tune queues. Instead of setting a queue size, we should be tuning the max permissible latency in the queue which is what a request timeout actually is. That way, you stay within the acceptable response time SLA while keeping only the serve-able requests in the queue.
Aperture, an open-source load management platform took this approach. Each request specifies a timeout for which it is willing to stay in the queue. And weighted fair queuing scheduler then allocates the capacity (a request quota or max number of in-flight requests) based on the priority and tokens (request heaviness) of each request.
Read more about the WFQ scheduler in Aperture: https://docs.fluxninja.com/concepts/scheduler
Link to Aperture's GitHub: https://github.com/fluxninja/aperture
Would love to hear your thoughts on our approach!
No comments yet.