top | item 39552099

(no title)

Fiahil | 2 years ago

I used the same approach while designing a lock-free bounded broadcast log (as in a "RWLock<Vec<T>>"; a MCMP, append-only Vec). It's quite easy to do because it's bounded. However, I could not find a way to make it both unbounded and efficient.

Any ideas ?

discuss

order

duped|2 years ago

It's very hard to beat locking when the queue needs grow. It's the performance statistics - you're going to grow more often as the app warms up, reach a steady state, and only need to grow under heavier than expected load. And in that case you probably aren't going to see performance dominated by time spent waiting to acquire a write lock.

The alternative is to dive into the literature on lock-free mpmc queues, which are kind of gnarly to implement. A lot of the literature handwaves ABA problems with stuff like "use hazard pointers and RCU" without correct pseudo code to help you.

That's why locking in unbounded queues is popular, imo. It's not actually inefficient, and the alternatives are arcane enough to avoid. No one is going to understand or trust the code anyway.

It's worth mentioning that "lock free" means "one task does not prevent other tasks from making progress" and in the case of a bounded queue, you can trivially accomplish that by busy-waiting when the queue is full. This isn't appropriate if you need consumers to receive events in the order in which they happen, but you can kind of fix that using an atomic counter (to paraphrase Mike Acton, if you have a safe counter, you have a safe queue) and time stamp events/sort on which ones are consumed.

Fiahil|2 years ago

> The alternative is to dive into the literature on lock-free mpmc queues, which are kind of gnarly to implement. A lot of the literature handwaves ABA problems with stuff like "use hazard pointers and RCU" without correct pseudo code to help you.

Yes and I do think they are fundamentally incorrect or they cannot be translated to the log use case.

I also agree that, performance-wise, the lock make little difference. In fact my old benchmarks tend to point that locks were MORE EFFICIENT than atomics on ARM64 (MBP M1), for example. It's more like a fun little exercise, and also to confirm that I'm not completely dumb and that the problem is not solvable with the simple use of 2 atomics and a counter.

paholg|2 years ago

I'm no expert here, but I wonder if a linked list of bounded logs would work well.

So, everyone has a pointer to the current one, and there's an atomic pointer to the next.

When the log fills up, you allocate a new one and set the next pointer to it using compare_and_swap. If someone beat you to it, you can walk the list and add yours to the end so that the allocation isn't wasted.

This way, most of the time you're using the efficient bounded log.

Fiahil|2 years ago

in the case of a bounded log, readers are expected to give an offset at which they want to perfom the read (kafka-like).

So the linked list would involve going through all the links and following end tail refs/pointers. It would make reading O(n) and that's a Nope.

However, you could imagine having a Vec index that contains a ref to all allocated inner-logs, and query the index first in order to obtain the buffers' location. That works, but then the index has to go through a lock (either a RWLock or a mutex) as the CaS operation isn't enough if we get to the end of the index and it needs to be reallocated. It's fine, and I think that's the most appropriate solution.

PS : In fact, there is a sweet spot where you'd like to have a semi-unbounded log. If your index is big enough to contain something like 4Md entries, you'd probably end-up splitting the log in several pieces for archiving and performances purposes. Loading the log (fully or partially) from disk efficiently is then more important than having a real unbounded log. Then you would not necessarily use a lock and could CaS in the index.

jamesmunns|2 years ago

I am not sure! Most of the data structures I design are for embedded systems without allocators. On the desktop, I mostly defer to others.

I've used tokio's broadcast channel quite a bit before, but it is also bounded. After talking to Eliza from the tokio project, I'm fairly convinced that unbounded queues are a scary thing to have around, operationally :).

But again - this is a bit out of my actual expertise!

jayshua|2 years ago

Not sure if it could be extended here, but I've seen a lock free hash map that supported lock free reallocation by allocating new space and moving each entry one by one, either when the entry is accessed or in a separate thread concurrently. Accessing an entry during the reallocation would check the new region first, and if not found check the old region. Entries in the old region would be marked as moved and once all entries were moved the old allocation could be freed.

Fiahil|2 years ago

For the (un)bounded logs, the whole concept reside on the fact that the log isn't going to move once allocated, and that references to an item will never be invalided until the end of the program