What are some of the other problems with rwlocks? I genuinely ask because I used them with quite good success for pretty complicated use-case at large scale and very high concurrency.
I have as well. I find RW locks much easier to use than, say, a recursive mutex. Mainly since it took me a long time to actually understand how a recursive mutex actually works in the first place. When you want to use only the stdlib, you aren't left with many choices. At least in the STL.
Mostly that if you actually have both readers and writers, they obstruct each other; this is often undesirable. And you have to pick some bias in advance. You can get priority inversion because readers are anonymous.
Sure, the workload was with both the readers and the writers, and it was a pretty "bursty" one with high volume of data which had to scale across all the cores. So, not a particularly light workload. It was basically a hi-concurrency cache which was write-mostly in the first-phase (ingestion), and then read-mostly in the second-phase (crunching). It had to support multiple sessions simultaneously at the same time so in the end it was about supporting heavily mixed read-write workloads, e.g. second-phase from session nr. 1 could overlap with the first-phase from session nr. 2.
To avoid the lock contention I managed to get away with sharding across the array of shared-mutexes and load-balancing the sessions by their UUIDs. And this worked pretty well - after almost ~10 years it's still rock solid, and workloads are basically ever changing.
I considered the RCU for this use-case too but I figured that it wouldn't be as good fit because the workload is essentially heavily mixed so I thought it would result with a lot of strain to the memory subsystem by having to handle multiple copies of the data (which was not small).
One thing I don't understand is the priority inversion and how that may happen. I'll think about it, thanks.
ethin|8 days ago
loeg|8 days ago
menaerus|7 days ago
To avoid the lock contention I managed to get away with sharding across the array of shared-mutexes and load-balancing the sessions by their UUIDs. And this worked pretty well - after almost ~10 years it's still rock solid, and workloads are basically ever changing.
I considered the RCU for this use-case too but I figured that it wouldn't be as good fit because the workload is essentially heavily mixed so I thought it would result with a lot of strain to the memory subsystem by having to handle multiple copies of the data (which was not small).
One thing I don't understand is the priority inversion and how that may happen. I'll think about it, thanks.