Good post, bit too “mathy” but makes me think of “Asynchronous computing @Facebook: Driving efficiency and developer productivity at Facebook scale”. Where they touch on capacity optimization (queuing + time shifting), capacity regulation along with user delay tolerance (bc not all jobs, even at the same priority level, are equal)
SpaceManNabs|6 months ago
For example, the paragraphs around the paragraph with "compute the exact Poisson tail (or use a Chernoff bound)" and that paragraph itself could be better illustrated with lines of math instead of mostly language.
I think you do need some math if you want to approach this probabilistically, but I agree that might not be the most accessible approach, and a hard threshold calculation is more accessible and maybe just as good.
cogman10|6 months ago
Particularly because distributed computer systems aren't pure math problems to be solved. Load often comes from usage which is often closer to random inputs rather than predicable variables. Further, how load is processed depends on a bunch of things from the OS scheduler to the current load on the network.
It can be hard to really intuitively understand that a bottlenecked system processes the same load slower than an unbound system.
ignoramous|6 months ago
motorest|6 months ago
I feel tha I'm missing something obvious. Isn't this doc reinventing the wheel in terms of what very basic task queue systems do? It describes task queues and task prioritization, and how it supports tasks that cache user data. What am I missing?