(no title)
vvern | 8 months ago
I understand there are practical reasons why you might want to just choose a concurrency and let it rip at a fixed warehouse size and say, “I ran TPC-C”, but you didn’t!
TPC-C when run properly is effectively an open-loop benchmark that scales where the load scales with the dataset size by having a fixed number of workers per warehouse (2?) that each issue transactions at some rate. It’s designed to have a low level of builtin contention that occurs based on the frequency of cross warehouse transactions, I don’t remember the exact rate but I think it’s something like 10%.
The benchmark has an interesting property that if the system can keep up with the transaction load by processing transactions quickly, it remains a low contention workload but if it falls behind and transactions start to pile up, then the number of contending transactions in flight will increase. This leads to non-linear degradation mode even beyond what normally happens with an open loop benchmark — you hit some limit and the performance falls off a cliff because now you have to do even more work than just catching up on the query backlog.
When you run without think time, you make the benchmark closed loop. Also, because you’re varying the number of workers without changing the dataset size (because you have to vary something to make your pretty charts), you’re changing the rate at which any given transaction is going to be on the same warehouse. So, you’ve got more contending transactions generally, but worse than that, because of Amdahl’s law, the uncontended transactions will fly through, so most of the time for most workers will be spend sitting waiting on contended keys.
transactional|8 months ago