top | item 39049229

(no title)

I agree that queues can cause problems especially when misconfigured. But some amount of queuing is necessary, to absorb short spikes in demand vs capacity. Also, queues can be helpful to re-order requests based on criticality which won't be possible with zero queue size - in which case we have to immediately drop a request or admit it without considering it's priority.

I think it is beneficial to re-think how we tune queues. Instead of setting a queue size, we should be tuning the max permissible latency in the queue which is what a request timeout actually is. That way, you stay within the acceptable response time SLA while keeping only the serve-able requests in the queue.

Aperture, an open-source load management platform took this approach. Each request specifies a timeout for which it is willing to stay in the queue. And weighted fair queuing scheduler then allocates the capacity (a request quota or max number of in-flight requests) based on the priority and tokens (request heaviness) of each request.

Read more about the WFQ scheduler in Aperture: https://docs.fluxninja.com/concepts/scheduler

Link to Aperture's GitHub: https://github.com/fluxninja/aperture

Would love to hear your thoughts on our approach!

discuss

No comments yet.