top | item 45014828

(no title)

evaXhill | 6 months ago

Good post, bit too “mathy” but makes me think of “Asynchronous computing @Facebook: Driving efficiency and developer productivity at Facebook scale”. Where they touch on capacity optimization (queuing + time shifting), capacity regulation along with user delay tolerance (bc not all jobs, even at the same priority level, are equal)

discuss

order

SpaceManNabs|6 months ago

I think the issue with the math is that it doesn't read well.

For example, the paragraphs around the paragraph with "compute the exact Poisson tail (or use a Chernoff bound)" and that paragraph itself could be better illustrated with lines of math instead of mostly language.

I think you do need some math if you want to approach this probabilistically, but I agree that might not be the most accessible approach, and a hard threshold calculation is more accessible and maybe just as good.

cogman10|6 months ago

For something like this, annotated graphs and examples (IMO) work a lot better than formulas in explaining the problem and solution.

Particularly because distributed computer systems aren't pure math problems to be solved. Load often comes from usage which is often closer to random inputs rather than predicable variables. Further, how load is processed depends on a bunch of things from the OS scheduler to the current load on the network.

It can be hard to really intuitively understand that a bottlenecked system processes the same load slower than an unbound system.

ignoramous|6 months ago

> Asynchronous computing @Facebook: Driving efficiency and developer productivity ... optimization (queuing + time shifting), capacity regulation along with user delay tolerance ...

  We can infer a more detailed priority by understanding how long each of these asynchronous requests can be delayed ... For each job to be executed, we try to execute it as close as possible to its delay tolerance.

  ... we defer jobs with a long delay tolerance so that the workload is spread over a longer time window. Queueing plays an important role in selecting the most urgent job to execute first.

  ...  Time shifting ... optimize capacity in Async:

  1. Predictive compute collects the data people used yesterday. Predicated on which data people may need, it precomputes before peak hours and stores the data in cache ... This moves the computing lift from peak hours to off-peak hours and trades a little cache miss for better efficiency. 

  2. Deferred compute ... schedules a job as part of user request handling but runs at much later time. For instance, the "people you may know" list is processed during off-peak hours, then shown when people are online (generally during peak hours). 
https://engineering.fb.com/2020/08/17/production-engineering... / https://archive.vn/A87hl

motorest|6 months ago

> Asynchronous computing @Facebook

I feel tha I'm missing something obvious. Isn't this doc reinventing the wheel in terms of what very basic task queue systems do? It describes task queues and task prioritization, and how it supports tasks that cache user data. What am I missing?