(no title)
mjpt777 | 11 years ago
So if "killed mid-operation" must be supported then I don't see how many algorithms can be said to make progress. Take for example the Lamport SPSC queue[1], if the producer gets killed mid operation between steps 4 and 5 of the push operation. Then the data is in the queue but the consumer is blocked from ever seeing it with this line of reasoning. The Lamport SPSC queue is considered wait-free by the concurrency community I know. I base my reasoning on this. What if the producer takes a pause for a long time between steps 4 and 5 before continuing?
However if to be wait-free an algorithm must allow all other threads to continue using the data structure, not just continue making progress in other ways by being non-blocking and completing in a finite number of steps, then I stand corrected.
If wait-free must include coping with any thread being killed mid operation is there a term for being lock-free but also having all threads not block and complete in a finite number of steps for their interaction with the algorithm?
danbruc|11 years ago
In case of the SPSC queue the requirement is that the writer must be able to enqueue new items no matter what the reader does and the reader must be able to dequeue all items for which the enqueue operation succeeded no matter what the writer does afterwards. The presented queue implementation meets this requirements. If the writer fails between steps 4 and 5 the write did not succeed and it does not matter that the reader can not see the item.
What you say about keeping the data structure usable for other threads seems correct to me.
You must not confuse lock-free and not using a lock, these are two different things. An algorithm using locks is a blocking algorithm but using locks is only sufficient, not mandatory. If no process can block other processes unconditionally the algorithm is non-blocking and obstruction-free, lock-free and wait-free are different progress guarantees for non-blocking algorithms in ascending order of strength.
One last point, if you try to obtain a lock and fail to do so you are considered blocked, no matter if you spin, suspend the thread or do unrelated things to waste some time until retrying later.
mjpt777|11 years ago
Each thread under the algorithm can perform their actions in a finite number of steps without ever blocking. This means the producers can continue to do other work. The consumer can continue to consume from other log buffers without being blocked and complete in a finite number of steps. If a producer is killed mid operation then no further progress can be made on that log buffer. If this is considered blocking then the algorithm is blocking and therefore not wait-free. It would need to be killed by another malicious thread for this to happen.
What is clear is that this algorithm gives the best latency profile of of all the measured messaging systems and the highest throughput. I now have the challenge of searching for a name that best describes its behaviour.