(no title)
Fiahil
|
2 years ago
I used the same approach while designing a lock-free bounded broadcast log (as in a "RWLock<Vec<T>>"; a MCMP, append-only Vec). It's quite easy to do because it's bounded. However, I could not find a way to make it both unbounded and efficient.Any ideas ?
duped|2 years ago
The alternative is to dive into the literature on lock-free mpmc queues, which are kind of gnarly to implement. A lot of the literature handwaves ABA problems with stuff like "use hazard pointers and RCU" without correct pseudo code to help you.
That's why locking in unbounded queues is popular, imo. It's not actually inefficient, and the alternatives are arcane enough to avoid. No one is going to understand or trust the code anyway.
It's worth mentioning that "lock free" means "one task does not prevent other tasks from making progress" and in the case of a bounded queue, you can trivially accomplish that by busy-waiting when the queue is full. This isn't appropriate if you need consumers to receive events in the order in which they happen, but you can kind of fix that using an atomic counter (to paraphrase Mike Acton, if you have a safe counter, you have a safe queue) and time stamp events/sort on which ones are consumed.
Fiahil|2 years ago
Yes and I do think they are fundamentally incorrect or they cannot be translated to the log use case.
I also agree that, performance-wise, the lock make little difference. In fact my old benchmarks tend to point that locks were MORE EFFICIENT than atomics on ARM64 (MBP M1), for example. It's more like a fun little exercise, and also to confirm that I'm not completely dumb and that the problem is not solvable with the simple use of 2 atomics and a counter.
paholg|2 years ago
So, everyone has a pointer to the current one, and there's an atomic pointer to the next.
When the log fills up, you allocate a new one and set the next pointer to it using compare_and_swap. If someone beat you to it, you can walk the list and add yours to the end so that the allocation isn't wasted.
This way, most of the time you're using the efficient bounded log.
Fiahil|2 years ago
So the linked list would involve going through all the links and following end tail refs/pointers. It would make reading O(n) and that's a Nope.
However, you could imagine having a Vec index that contains a ref to all allocated inner-logs, and query the index first in order to obtain the buffers' location. That works, but then the index has to go through a lock (either a RWLock or a mutex) as the CaS operation isn't enough if we get to the end of the index and it needs to be reallocated. It's fine, and I think that's the most appropriate solution.
PS : In fact, there is a sweet spot where you'd like to have a semi-unbounded log. If your index is big enough to contain something like 4Md entries, you'd probably end-up splitting the log in several pieces for archiving and performances purposes. Loading the log (fully or partially) from disk efficiently is then more important than having a real unbounded log. Then you would not necessarily use a lock and could CaS in the index.
jamesmunns|2 years ago
I've used tokio's broadcast channel quite a bit before, but it is also bounded. After talking to Eliza from the tokio project, I'm fairly convinced that unbounded queues are a scary thing to have around, operationally :).
But again - this is a bit out of my actual expertise!
jayshua|2 years ago
Fiahil|2 years ago