liblfds-temp | 1 year ago | on: C Style: My favorite C programming practices (2014)
liblfds-temp's comments
liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics
I would disagree. I think the Intel docs do not specify a guarantee of flushing, and so if the SP and SC are on different cores, I think then in principle (but not in practise) the SC could in fact never see what the SP emits.
liblfds-temp | 1 year ago | on: Memory Consistency Models: A Tutorial
liblfds-temp | 1 year ago | on: C Style: My favorite C programming practices (2014)
Read the type for all the below right-to-left, substituting the word "pointer" for "*".
int long long unsigned wibble; // unsigned long long int
double const *long_number; // pointer to a const double
double volatile * const immutable_pointer; // immutable pointer to a volatile double
They all read correctly now, when read right-to-left. It's not just "const" you do this for, as per the advice. Do it for all qualifiers.liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics
That's the thing - I may be wrong, but I'm under the impression store buffers do not guarantee anything. I'm pretty certain they do not on Intel, for example. All you read in the docs is "flushed in a reasonable time", which can mean anything.
> and a load that reaches the cache layer is guaranteed to see the last version of the data that was stored in that memory location by any coherent agent in the system.
Yes.
> As pointed out elsethread, you need load barriers specifically to order your loads, so they are needed for ordering, not visibility.
Mmm - again, I may be wrong; but I think also no guarantees about behaviour of handling of cache invalidation requests. Given no guarantee, then in theory the invalidation request is never handled (unless you force it to be with a read barrier).
When I need a set of data to be written (say a struct - something bigger than you can manage in an atomic op), I'll write to the struct, fence, then perform a throw-away atomic op. The atomic op forces a write (a normal write could just be deferred and not force completion of pre-barrier writes) and then I know my struct has gone out past the store buffer and has reached cache control.
liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics
Yes. I mean, you need the load barrier to see stores safely. Also, in principle, since AFAIK there are no guarantees about cache invalidation request behaviour, in theory it could fail to be read forever - in which case you would in fact literally never see the writes on other cores (until you issued a read barrier).
liblfds-temp | 1 year ago | on: Memory Consistency Models: A Tutorial
> Of course, this is exactly the behavior we were trying to avoid by introducing store buffers and other optimizations. Barriers are an escape hatch to be used sparingly: they can cost hundreds of cycles.
I may be wrong, but I think to readers who do not already know what barriers do and are, this makes barriers seem like blocking behaviours.
liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics
However, "completed" is misleading.
The write will still not be seen by readers.
"Complete" really means "the writer has done all the work he can do, which is necessary but insufficient".
For a reader to see the write, the readers must issue a read memory barrier before reading the value.
All of this is due to store buffers and cache invalidation requests.
In short, in principle, a write is known about only by the processor it occurred upon; if there are any other processors in the system, then unless special magic is performed (fences and so on), any writes they see from other processors are seen purely by chance and can happen in any order (including going backwards in time), or not happen at all.
I once read an article which framed all this in terms of source control.
Memory is like SVN, or Git.
You make local changes (on your processor, which has its local copy of memory). You then commit (write fence/atomic operation). No one else can see your changes until they update from source control and get the latest version (read fence).
liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics
I may be completely wrong, it's a complicated subject, but I think the wording here is potentially misleading.
As I understand it (and I may be wrong!), a full fence does the following : "all reads and all writes prior to the fence must occur before any reads or writes after the fence".
What I'm concerned about is people thinking this is a blocking behaviour, i.e. when we hit the fence, then at that point all prior reads and writes occur.
This is not the case.
The reads and writes prior to the fence can occur at any time - they could occur LONG AFTER we pass the fence - but what we do get from the fence is that the reads and writes prior to the fence WILL occur BEFORE any reads or writes after the fence.
So, to put it another way, the code is executing, we come to the fence - and absolutely nothing happens. The processor just keeps going. No reads occur, no writes occur.
Then at some point later on, both in time and in code, the process comes to another (say) read. NOW, finally, the processor MUST complete all reads and writes which were issued prior to the fence.
liblfds-temp | 2 years ago | on: An introduction to lockless algorithms (2021)
liblfds-temp | 2 years ago | on: An introduction to lockless algorithms (2021)
I've been working on other projects for the last number of years, but I'm finally back to liblfds and making progress toward the next release.
C declarations can become unfriendly by being too complex and disordered.