top | item 33217066

(no title)

As a regular programmer, I got really into queuing theory thinking I was going to learn secrets of performance tuning, then was slightly disappointed.

But it turns out the simple parts of it go a long way! Eg at work we have a single deployment queue for a monorepo. At first approx this is an MD1 queue (deploy job takes roughly the same amount of time every time, though arrivals are actually way spikier than poisson), and realized that wait time was inversely proportional to total utilization.

While the infra team was saying “we do X deploys per day out of 4X and are only at 25% capacity”, I realized even hitting 2X would more than double the already bad wait time.

What happened is that a few initiatives were under way to increase capacity, but then out of nowhere the queue got log jammed (bc of high arrival rate variability) and we had to switch to Gitlab merge trains, which run CI concurrently on the optimistic result of merging. I wrote about it here: https://engineering.outschool.com/posts/doubling-deploys-git...

I’m planning on writing a blog post about the math of CI/CD deploy queues as G/D/1 queues.

For a programmer’s view of queuing theory and great performance testing foundations, I highly recommend “Analysing Computer system performance with Perl:PDQ” (don’t worry about the Perl, the book is very relevant) http://www.perfdynamics.com/iBook/ppa_new.html, which shows examples of queues inside computer systems and how to model them. The author has a nice little library to model different computer systems you come across.

I liked realizing that the dreaded “coordinated omission” problem in load testing (when you can only generate X rps and the server can handle more than that, your numbers are bunk) is actually when you think you are modeling an open system but you don’t have enough resources and end up seeing the closed system behaviour.

discuss

kqr|3 years ago

That's not what coordinated omission is. CO happens when your load generator can do, say, 500 requests/second, and you're aiming for something well below that, e.g. 100, but! the top throughput for the server is 80 requests per second, and instead of building up an infinite queue, the load generator throttles itself to roughly 80 requests per second.

Why would you build a load generator like this? Normally because you run out of threads -- you have 800 parallel requests in flight and you can't open a new one until one has returned.

Correcting for CO takes a mathematical sleight of hand.

pramodbiligiri|3 years ago

That was a neat blog post! Can you clarify this bit: "If a pipeline fails, the associated MR is kicked from the train and all preceding pipelines are restarted." - Do you mean all the pipelines _already started_ are restarted? Because I'm guessing that if a pipeline fails, the preceding pipelines shouldn't be affected, strictly speaking.

cool-RR|3 years ago

Off-topic: I'm toying with GPT-3 and I used it to extend rockmeamedee's comment: https://gist.github.com/cool-RR/12b0fc8106f6df18925de632f2b6...