I was watching the very good oral history of Leslie Lamport from the Computer History Museum and was struck by how he said that the Bakery Algorithm was the thing he was most proud of, despite Paxos being much more elaborate and, in my mind, important. He also described the Bakery Algorithm as being the only thing that he "discovered" rather than "invented"; he later clarified that this was because it "came out of nowhere" and had no precedent to his knowledge from any previous ideas from him or anyone else.
Although it seems pretty simple on the surface, it came as a surprise to many people who had been working hard on the mutual exclusion problem in the early 1970s, with one of Lamport's colleagues (Anatol Holt) describing it as "impossible" upon first hearing it. Not only did it completely solve the mutual exclusion problem, but it managed to do so without requiring atomicity of reads or writes, and it was fully described and proved in just 3 pages. This was something that both Dijkstra and Knuth were unable to do despite each writing at least one paper on the subject prior to Lamport's solution in 1974.
Anyway, I thought it might be fun to demonstrate the principles of how it works in Python. In some ways, Python is uniquely unsuited for this because it has a Global Interpreter Lock (GIL), the main purpose of which is precisely to avoid the kinds of mutual exclusion issues that the Bakery Algorithm solves! Still, we can "fake it" by using the multiprocessing library and by introducing random delays at various points in the code.
The papers by Dijkstra and Knuth that you mention are I think:
• (1965 Dijkstra) "Solution of a problem in concurrent programming control" (in CACM)
• (1966 Knuth) "Additional comments on a problem in concurrent programming control" (Letter to CACM, reprinted with an addendum as Chapter 13 of his Selected Papers on Design of Algorithms)
[From the addendum and comments on the latter, I don't get the impression that it had an error (and I don't get that impression from Lamport's comments in the transcripts either). He cites Gary L. Peterson's "Myths About the Mutual Exclusion Problem" (1981, https://zoo.cs.yale.edu/classes/cs323/doc/Peterson.pdf) as a later improvement though. But I may have misunderstood, as I have not read any of these papers/letters or even understood the question :)]
> Even if this read and write are not atomic, the system still works
This statement is not really true for modern computers, especially ARM and other architectures which are more sensitive to memory orderings than x86 CPUs.
The reason is the lack of memory barriers.
The CPU and the compiler are free to reorder loads and stores. Memory barriers are inserted in between to enforce specific ordering.
You need an acquire memory barrier after acquiring a lock to make sure that any loads and stores inside the critical section do not get reordered to before the lock was grabbed.
So while the algorithm may work for acquiring a critical section without atomics, it does not enforce that the stuff inside the critical section stays inside the critical section within the same thread.
In fact most normal loads and stores in ARM are "atomic" but to achieve consistency in concurrent programming they are followed or preceded by an appropriate memory barrier instruction e.g. in std::atomic_load or _store.
Actually acquire/release barriers or operations are not enough. I'm pretty sure you also need a #StoreLoad for example between Step 1 and Step 2 in the lock algorithm, to prevent the loads from other thread state in step 2 to be reordered before the stores to the own thread state in step 1.
Generally #StoreLoad is required when you need bidirectional synchronization between two or more threads.
Re atomics, I think the algorithm still requires word sized operations to be atomic, but I think that was considered a given. What the algorithm doesn't require is atomic load/stores (or more complex RMW) across more than one word.
I tried to think how this might go wrong. The part that I can see going wrong is with the 'choosing' array, an optimizer may choose to combine the two writes and just write 'false' immediately. I feel like that breaks a lot of reasonable guarantees, but let's run with it.
So basically the problem is that two processes might pick the same number, see the other process with an (outdated) lower number and choosing=false, and go ahead into the critical section.
So basically you need a guarantee that the write to the choosing array lands before picking a number. You can simplify things a bit by simply setting 'number' high instead of using 2 arrays, but still.
>You need an acquire memory barrier after acquiring a lock to make sure that any loads and stores inside the critical section do not get reordered to before the lock was grabbed.
What do you mean? That loads and stores inside a critical section could, in fact, be reordered by the CPU and be executed before grabbing a lock?
One additional clarification is useful to make, which is WHY the algorithm is robust to non-atomic reads and writes.
Basically, if one process is reading a value while it is being written to by another process, then it’s possible that the reading process will not just get an old value, but it might get a complete garbage value caused by a partially written or garbled memory value. But even then it still works, and doesn’t even require any kind of “retry” logic, for this reason:
Recall that when a process wants to enter the critical section, it first looks at the numbers (or tickets) assigned to other processes to determine its own number. The process chooses a number that is one higher than the highest number it observes.
Now, if a process reads a garbage value due to a concurrent write, it typically just ignores this value. This is because the garbage value is unlikely to be higher than the maximum valid number it can read from other processes. As a result, this garbage value does not impact the process’s decision about what number to take for itself.
The key is that the process bases its decision on the valid numbers it can read. The algorithm does not depend on every read being perfect. If a process cannot determine a clear maximum due to a garbage value, it bases its decision on the highest valid number it can discern.
The process then enters a waiting phase where it checks if it’s its turn to enter the critical section, based on its number and others’. Again, the presence of a garbage value in one of the reads does not impede this process, as the algorithm only requires a relative ordering based on numbers.
Despite the possibility of reading invalid values, the bakery algorithm maintains safety (no two processes are in the critical section simultaneously) and liveness (every process eventually gets its turn). This is because its core logic - ordering based on numbers and waiting for its turn - remains intact.
But stochastically, it’s possible you’re given a legal value that violates mutual exclusion, no?
I’m very fuzzy as to how the busy wait loop that checks all sibling threads for their lock counter number can distinguish this scenario. Cause you could be TID 1, and obtain ticket X even though TID 2 got there first and entered mutual exclusion because at the time when it checked you weren’t even assigned a ticket and entered mutual exclusion… I’m missing something subtle about the algorithm.
> The Bakery Algorithm's unique feature is its ability to ensure mutual exclusion in concurrent programming without requiring atomic reads and writes.
This is the key, and somewhat surprising. Had someone told me critical section was possible without hardware support (atomic reads and writes), my knee-jerk reaction would have been "not possible".
If I understand this correctly, the algorithm depends on the underlying platform to ensure correct ordering. Two processes may each get the same bakery ticket number. The conflict is resolved by the process with the lower process id being allowed to go first. This process ID is in effect a ticket number issued by the underlying hardware. If the processes were running in a distributed environment, there would not be a common underlying platform to resolve this conflict.
In that context you could simply pick another thing as the tie breaker, like the IP address expressed as an integer (ie, remove the periods and treat like a number) being lower.
But it's not in a distributed environment, it is a shared memory environment. That is the whole point. If it were a distributed environment the processes would not have to worry about smashing each others memory because they would not share memory in the first place.
[+] [-] eigenvalue|2 years ago|reply
Although it seems pretty simple on the surface, it came as a surprise to many people who had been working hard on the mutual exclusion problem in the early 1970s, with one of Lamport's colleagues (Anatol Holt) describing it as "impossible" upon first hearing it. Not only did it completely solve the mutual exclusion problem, but it managed to do so without requiring atomicity of reads or writes, and it was fully described and proved in just 3 pages. This was something that both Dijkstra and Knuth were unable to do despite each writing at least one paper on the subject prior to Lamport's solution in 1974.
Anyway, I thought it might be fun to demonstrate the principles of how it works in Python. In some ways, Python is uniquely unsuited for this because it has a Global Interpreter Lock (GIL), the main purpose of which is precisely to avoid the kinds of mutual exclusion issues that the Bakery Algorithm solves! Still, we can "fake it" by using the multiprocessing library and by introducing random delays at various points in the code.
Here is the oral history video btw:
https://www.youtube.com/watch?v=SXt3-iZpQQc
[+] [-] svat|2 years ago|reply
And here are the transcripts: https://archive.computerhistory.org/resources/access/text/20... (Part 1, 2016-08-12) and https://archive.computerhistory.org/resources/access/text/20... (Part 2, 2016-11-11)
(All four available from https://www.computerhistory.org/collections/oralhistories/?s...)
The papers by Dijkstra and Knuth that you mention are I think:
• (1965 Dijkstra) "Solution of a problem in concurrent programming control" (in CACM)
• (1966 Knuth) "Additional comments on a problem in concurrent programming control" (Letter to CACM, reprinted with an addendum as Chapter 13 of his Selected Papers on Design of Algorithms)
[From the addendum and comments on the latter, I don't get the impression that it had an error (and I don't get that impression from Lamport's comments in the transcripts either). He cites Gary L. Peterson's "Myths About the Mutual Exclusion Problem" (1981, https://zoo.cs.yale.edu/classes/cs323/doc/Peterson.pdf) as a later improvement though. But I may have misunderstood, as I have not read any of these papers/letters or even understood the question :)]
[+] [-] exDM69|2 years ago|reply
This statement is not really true for modern computers, especially ARM and other architectures which are more sensitive to memory orderings than x86 CPUs.
The reason is the lack of memory barriers.
The CPU and the compiler are free to reorder loads and stores. Memory barriers are inserted in between to enforce specific ordering.
You need an acquire memory barrier after acquiring a lock to make sure that any loads and stores inside the critical section do not get reordered to before the lock was grabbed.
So while the algorithm may work for acquiring a critical section without atomics, it does not enforce that the stuff inside the critical section stays inside the critical section within the same thread.
In fact most normal loads and stores in ARM are "atomic" but to achieve consistency in concurrent programming they are followed or preceded by an appropriate memory barrier instruction e.g. in std::atomic_load or _store.
[+] [-] gpderetta|2 years ago|reply
Generally #StoreLoad is required when you need bidirectional synchronization between two or more threads.
Re atomics, I think the algorithm still requires word sized operations to be atomic, but I think that was considered a given. What the algorithm doesn't require is atomic load/stores (or more complex RMW) across more than one word.
[+] [-] contravariant|2 years ago|reply
So basically the problem is that two processes might pick the same number, see the other process with an (outdated) lower number and choosing=false, and go ahead into the critical section.
So basically you need a guarantee that the write to the choosing array lands before picking a number. You can simplify things a bit by simply setting 'number' high instead of using 2 arrays, but still.
[+] [-] arter4|2 years ago|reply
What do you mean? That loads and stores inside a critical section could, in fact, be reordered by the CPU and be executed before grabbing a lock?
[+] [-] eigenvalue|2 years ago|reply
Basically, if one process is reading a value while it is being written to by another process, then it’s possible that the reading process will not just get an old value, but it might get a complete garbage value caused by a partially written or garbled memory value. But even then it still works, and doesn’t even require any kind of “retry” logic, for this reason:
Recall that when a process wants to enter the critical section, it first looks at the numbers (or tickets) assigned to other processes to determine its own number. The process chooses a number that is one higher than the highest number it observes.
Now, if a process reads a garbage value due to a concurrent write, it typically just ignores this value. This is because the garbage value is unlikely to be higher than the maximum valid number it can read from other processes. As a result, this garbage value does not impact the process’s decision about what number to take for itself.
The key is that the process bases its decision on the valid numbers it can read. The algorithm does not depend on every read being perfect. If a process cannot determine a clear maximum due to a garbage value, it bases its decision on the highest valid number it can discern.
The process then enters a waiting phase where it checks if it’s its turn to enter the critical section, based on its number and others’. Again, the presence of a garbage value in one of the reads does not impede this process, as the algorithm only requires a relative ordering based on numbers.
Despite the possibility of reading invalid values, the bakery algorithm maintains safety (no two processes are in the critical section simultaneously) and liveness (every process eventually gets its turn). This is because its core logic - ordering based on numbers and waiting for its turn - remains intact.
[+] [-] vlovich123|2 years ago|reply
I’m very fuzzy as to how the busy wait loop that checks all sibling threads for their lock counter number can distinguish this scenario. Cause you could be TID 1, and obtain ticket X even though TID 2 got there first and entered mutual exclusion because at the time when it checked you weren’t even assigned a ticket and entered mutual exclusion… I’m missing something subtle about the algorithm.
[+] [-] ur-whale|2 years ago|reply
This is the key, and somewhat surprising. Had someone told me critical section was possible without hardware support (atomic reads and writes), my knee-jerk reaction would have been "not possible".
[+] [-] valray|2 years ago|reply
[+] [-] eigenvalue|2 years ago|reply
[+] [-] somat|2 years ago|reply
[+] [-] brchr|2 years ago|reply
[+] [-] otrack|2 years ago|reply
https://lamport.azurewebsites.net/tla/boulangerie.html