(EDITED; misread algorithm earlier.) No, sorry, that doesn't work. Consider the following sequence:
1. A sets g1 = A
2. A reads g2 = none
3. A reads g1 = A
4. B sets g1 = B
5. B reads g2 = none
6. B reads g1 = B
7. A sets g2 = A (read in #3)
8. A reads g1 = B
9. A retries
10. A reads g2 = A
11. A succeeds
12. B sets g2 = B (read in #6)
13. B reads g1 = B
14. B reads g2 = B
15. B succeeds
You also are missing memory fences, which are needed to ensure proper ordering of memory operations. (Otherwise the processor is free to speculate reads and delay writes between cores.) See https://www.kernel.org/doc/Documentation/memory-barriers.txt for a thorough explanation.
I can see a few problems with this code, please consider the following constructive criticism.
for (;;) {
if (state->g2 == locker_id) {
return;
}
while (state->g2 != SPIN_LOCK_NO_LOCKER) {
if (spin_wait_callback != NULL) {
spin_wait_callback();
}
You don't need this 'continue', the loop will do that just fine without it.
continue;
}
The following four lines are problematic:
state->g1 = locker_id;
if (state->g2 == SPIN_LOCK_NO_LOCKER) {
Imagine the consequences of a task switch right here.
state->g2 = state->g1;
This if does nothing.
if (state->g1 == locker_id) {
And you're going to do the following if without all the extra
conditionals above when you do not return from here the
next time you iterate through the loop. So all that extra
testing of state->g1 above does nothing.
if (state->g2 == locker_id) {
return;
}
}
}
}
I think I can see what you're trying to do here but once you
fix the basic problems with the code and you test it under
very high load locking and unlocking a resource you'll find
that fairly soon two threads will be accessing the resource
at the same time.
Try a very tight loop that locks a counter, does a load, increment, store
on it and then unlocks the counter again. Keep separate tallies
per thread and use a third thread to keep track of the expected
sum and the actual sum.
That should show you reasonably quickly why atomicity in the
basic operation of acquiring a lock is a base requirement.
Locking issues almost never show up when you are running lightly loaded on a single core. But the lack of ensuring all cores see the same information, that race conditions can't occur and that instructions are executed in the right order will show up with a terrible certainty under load on multiple cores.
so, the attempt here (line 32 which you believe is problematic) I think can be pre-empted.
if thread 1 is pre-empted by thread 2 and thread 2 is able to get to the same point (line 33) then G1 will be equal to 2. both threads will assign G2 = 2 and thread 2 will win the race -- thread 1 will loop and begin to spin again.
Both the compiler and CPU will reorder operations. Proper locking requires a memory barrier which will prevent both CPU and compiler from doing that. For more information I recommend reading up on double checked locking. DDJ had a good pair of articles the first where they presented the faulty algorithm, the second where they talked about why it was broken.
There seems to be an implicit assumption in this code that writes which one thread makes will be seen by other threads in the same order. This is not the case on modern architectures. You need both compiler and machine-level memory barriers to ensure this. C's "volatile" keyword does not provide either.
if (state->g2 == SPIN_LOCK_NO_LOCKER) {
state->g2 = state->g1;
if (state->g1 == locker_id) {
if (state->g2 == locker_id) {
return;
}
}
}
You need to insert memory barriers between the writes and the reads. Without, on systems with multiple cores, other cores may not see the state changes in program order, which will make it possible for multiple threads to conclude they won the race.
The example code he provides doesn't even compile:
real_test.c: In function ‘main’:
real_test.c:14:10: error: too many arguments to function ‘spin_allocate’
spin-lock.h:20:32: note: declared here
real_test.c:16:3: error: too few arguments to function ‘spin_lock’
spin-lock.h:19:13: note: declared here
It seems like this is just storing a thread id in some shared variables a few times and double checking that your id "made it" and you win the race. My understanding of why this doesn't work is without atomics or memory barriers, what you read might not reflect what another core has in its cache, and when you write, it might not be immediately visible to others. So while in your thread's view of the world you might have successfully written those 2 words and won the race, it might not matter to the other guy. Though this depends on the consistency model of the CPU. The table here comes to mind: http://en.m.wikipedia.org/wiki/Memory_ordering - my memory is that x86 is generous in how much it lets you get away with this, you might have more trouble running on say, power.
compare-and-swap instruction itself may not be privileged operation but user space spin-lock seldom work in user-space (it's dangerous at least), because a user thread could be preempted.
[+] [-] colanderman|12 years ago|reply
1. A sets g1 = A
2. A reads g2 = none
3. A reads g1 = A
4. B sets g1 = B
5. B reads g2 = none
6. B reads g1 = B
7. A sets g2 = A (read in #3)
8. A reads g1 = B
9. A retries
10. A reads g2 = A
11. A succeeds
12. B sets g2 = B (read in #6)
13. B reads g1 = B
14. B reads g2 = B
15. B succeeds
You also are missing memory fences, which are needed to ensure proper ordering of memory operations. (Otherwise the processor is free to speculate reads and delay writes between cores.) See https://www.kernel.org/doc/Documentation/memory-barriers.txt for a thorough explanation.
[+] [-] keithgabryelski|12 years ago|reply
if step #9 happens then #10 will be skipped by the "if (state->g2 == SPIN_LOCK_NO_LOCKER) {"
what am i missing?
[+] [-] mgraczyk|12 years ago|reply
[+] [-] jacquesm|12 years ago|reply
I can see a few problems with this code, please consider the following constructive criticism.
You don't need this 'continue', the loop will do that just fine without it. The following four lines are problematic: Imagine the consequences of a task switch right here. This if does nothing. And you're going to do the following if without all the extra conditionals above when you do not return from here the next time you iterate through the loop. So all that extra testing of state->g1 above does nothing. I think I can see what you're trying to do here but once you fix the basic problems with the code and you test it under very high load locking and unlocking a resource you'll find that fairly soon two threads will be accessing the resource at the same time.Try a very tight loop that locks a counter, does a load, increment, store on it and then unlocks the counter again. Keep separate tallies per thread and use a third thread to keep track of the expected sum and the actual sum.
That should show you reasonably quickly why atomicity in the basic operation of acquiring a lock is a base requirement.
Locking issues almost never show up when you are running lightly loaded on a single core. But the lack of ensuring all cores see the same information, that race conditions can't occur and that instructions are executed in the right order will show up with a terrible certainty under load on multiple cores.
[+] [-] keithgabryelski|12 years ago|reply
if thread 1 is pre-empted by thread 2 and thread 2 is able to get to the same point (line 33) then G1 will be equal to 2. both threads will assign G2 = 2 and thread 2 will win the race -- thread 1 will loop and begin to spin again.
have i missed something?
[+] [-] jpollock|12 years ago|reply
C/c++ doesn't have many guarantees about operation ordering. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLo...
[+] [-] colin_mccabe|12 years ago|reply
The Linux kernel guys wrote an in-depth explanation of why volatile doesn't work for this use-case here: https://www.kernel.org/doc/Documentation/volatile-considered...
[+] [-] comex|12 years ago|reply
http://www.di.ens.fr/~pouzet/cours/systeme/bib/dijkstra.pdf
http://phoenix.goucher.edu/~kelliher/cs42/sep30.html
[+] [-] tsukikage|12 years ago|reply
http://en.wikipedia.org/wiki/Memory_barrier http://en.wikipedia.org/wiki/Weak_consistency
[+] [-] mgraczyk|12 years ago|reply
[+] [-] Moral_|12 years ago|reply
[+] [-] asveikau|12 years ago|reply
[+] [-] nkurz|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] frozenport|12 years ago|reply
What is a `machine level exchange instruction`?
[+] [-] mbell|12 years ago|reply
http://en.wikipedia.org/wiki/Compare-and-swap
[+] [-] michaelmior|12 years ago|reply
[+] [-] tsu|12 years ago|reply
[+] [-] keithgabryelski|12 years ago|reply