top | item 39048827

(no title)

tanveergill | 2 years ago

We have been building a platform called Aperture in the open-source trying to solve this very problem. Our approach is to let the developer decide how long they want to wait in the queue using a timeout parameter. If timeout is hit, they may re-try with exponential backoff or load shed. While in the queue, requests get prioritized based on a weighted fair queuing algorithm. If there is tiering in the app, there could be a policy that can allocate majority of the capacity to paid vs free customer tiers. But this allocation is not really static, if there is free capacity available in the system then the free customer tier can take all of it. This is just like how CPU time is allocated by Linux based on nice values, even low priority processes are allowed to take up all the CPU time when demand is low. Apart from relative allocation across user tiers, Aperture's request scheduler can also ensure fairness across individual users within each tier to make sure no single user is hogging up all of the server capacity.

The demand and capacity is determined based on a request rate quota or the maximum number of in-flight requests.

Would love the community here to check us out on GitHub and provide feedback: https://github.com/fluxninja/aperture

discuss

order

No comments yet.