I think that this can change the semantics though; with the preceding check you can miss the shared variable being decremented from another thread. In some cases, such as if the shared value is monotonic, this is done, but not in the general case.
With a relaxed ordering I'm not sure if that's right, since the ldumax would have no imposed ordering relation with the (atomic) decrement on another thread and so could very well have operated on the old value obtained by the non-atomic load
All operations on a single memory location are always totally ordered in a CC system, no matter how relaxed the memory model is.
Also am I understanding it correctly that n is the number of threads in your example? Don't you find it suspicious that the number of operations goes up as the thread count goes up?
edit: ok, you are saying that under heavy contention the check avoids having to do the store at all. This is racy, and whether this is correct or not, would be very application specific.
edit2: I thought about this a bit, and I'm not sure i can come up with a scenario where the race matters...
edit3: ... as long as all threads are only doing atomic_max operations on the memory location, which an implementation can't assume.
anematode|5 months ago
gpderetta|5 months ago
Also am I understanding it correctly that n is the number of threads in your example? Don't you find it suspicious that the number of operations goes up as the thread count goes up?
edit: ok, you are saying that under heavy contention the check avoids having to do the store at all. This is racy, and whether this is correct or not, would be very application specific.
edit2: I thought about this a bit, and I'm not sure i can come up with a scenario where the race matters...
edit3: ... as long as all threads are only doing atomic_max operations on the memory location, which an implementation can't assume.
ibraheemdev|5 months ago