liblfds-temp's comments

liblfds-temp | 1 year ago | on: C Style: My favorite C programming practices (2014)

Readability.

C declarations can become unfriendly by being too complex and disordered.

liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics

> I can't point you to the exact working in the intel docs as they are quite messy, but you can implement a perfectly C++11 compliant SPSC queue purely with load and stores on x86 without any fences or #LOCK operations.

I would disagree. I think the Intel docs do not specify a guarantee of flushing, and so if the SP and SC are on different cores, I think then in principle (but not in practise) the SC could in fact never see what the SP emits.

liblfds-temp | 1 year ago | on: Memory Consistency Models: A Tutorial

Mmm. Sort of the former, but really my thought was this : "you have a barrier, fine, but that does not mean once you hit/pass the barrier, all prior read/writes are complete - in fact, the completion of earlier reads/writes only occurs at the first read/write after the barrier".

liblfds-temp | 1 year ago | on: C Style: My favorite C programming practices (2014)

Declare all variables/qualifiers right-to-left.

Read the type for all the below right-to-left, substituting the word "pointer" for "*".

  int long long unsigned wibble; // unsigned long long int
  double const *long_number; // pointer to a const double
  double volatile * const immutable_pointer; // immutable pointer to a volatile double

They all read correctly now, when read right-to-left. It's not just "const" you do this for, as per the advice. Do it for all qualifiers.

liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics

> On a cc system, once a store is no longer speculative, it is guaranteed to be flushed out of the store buffer into the cache,

That's the thing - I may be wrong, but I'm under the impression store buffers do not guarantee anything. I'm pretty certain they do not on Intel, for example. All you read in the docs is "flushed in a reasonable time", which can mean anything.

> and a load that reaches the cache layer is guaranteed to see the last version of the data that was stored in that memory location by any coherent agent in the system.

Yes.

> As pointed out elsethread, you need load barriers specifically to order your loads, so they are needed for ordering, not visibility.

Mmm - again, I may be wrong; but I think also no guarantees about behaviour of handling of cache invalidation requests. Given no guarantee, then in theory the invalidation request is never handled (unless you force it to be with a read barrier).

When I need a set of data to be written (say a struct - something bigger than you can manage in an atomic op), I'll write to the struct, fence, then perform a throw-away atomic op. The atomic op forces a write (a normal write could just be deferred and not force completion of pre-barrier writes) and then I know my struct has gone out past the store buffer and has reached cache control.

liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics

> You do not need a load barrier to observe stores from other cores. You need the load barrier so your own core does not reorder loads otherwise you could see wacky loads even if the stores are totally ordered.

Yes. I mean, you need the load barrier to see stores safely. Also, in principle, since AFAIK there are no guarantees about cache invalidation request behaviour, in theory it could fail to be read forever - in which case you would in fact literally never see the writes on other cores (until you issued a read barrier).

liblfds-temp | 1 year ago | on: Memory Consistency Models: A Tutorial

> A barrier instruction forces all memory operations before it to complete before any memory operation after it can begin. That is, a barrier instruction effectively reinstates sequential consistency at a particular point in program execution.

> Of course, this is exactly the behavior we were trying to avoid by introducing store buffers and other optimizations. Barriers are an escape hatch to be used sparingly: they can cost hundreds of cycles.

I may be wrong, but I think to readers who do not already know what barriers do and are, this makes barriers seem like blocking behaviours.

liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics

A thread which performs a write, has a fence, and then performs another write, will at the time of the second write due to the fence guarantee the first write has completed.

However, "completed" is misleading.

The write will still not be seen by readers.

"Complete" really means "the writer has done all the work he can do, which is necessary but insufficient".

For a reader to see the write, the readers must issue a read memory barrier before reading the value.

All of this is due to store buffers and cache invalidation requests.

In short, in principle, a write is known about only by the processor it occurred upon; if there are any other processors in the system, then unless special magic is performed (fences and so on), any writes they see from other processors are seen purely by chance and can happen in any order (including going backwards in time), or not happen at all.

I once read an article which framed all this in terms of source control.

Memory is like SVN, or Git.

You make local changes (on your processor, which has its local copy of memory). You then commit (write fence/atomic operation). No one else can see your changes until they update from source control and get the latest version (read fence).

liblfds-temp | 1 year ago | on: Making Sense of Acquire-Release Semantics

> See how we added a new fence() call to put()? That fixes the reordering problem we’ve described at length. Now if the CPU gets bored waiting for entries[i] to be read into the cache, and tries to pull up the tail++ line so it happens sooner, bonk!, the tail++ line hits that fence and stops moving. We’ve forced the entry to be written before the tail is bumped. Problem solved!

I may be completely wrong, it's a complicated subject, but I think the wording here is potentially misleading.

As I understand it (and I may be wrong!), a full fence does the following : "all reads and all writes prior to the fence must occur before any reads or writes after the fence".

What I'm concerned about is people thinking this is a blocking behaviour, i.e. when we hit the fence, then at that point all prior reads and writes occur.

This is not the case.

The reads and writes prior to the fence can occur at any time - they could occur LONG AFTER we pass the fence - but what we do get from the fence is that the reads and writes prior to the fence WILL occur BEFORE any reads or writes after the fence.

So, to put it another way, the code is executing, we come to the fence - and absolutely nothing happens. The processor just keeps going. No reads occur, no writes occur.

Then at some point later on, both in time and in code, the process comes to another (say) read. NOW, finally, the processor MUST complete all reads and writes which were issued prior to the fence.

liblfds-temp | 2 years ago | on: An introduction to lockless algorithms (2021)

I know Samy. CK is extremely well made.

liblfds-temp | 2 years ago | on: An introduction to lockless algorithms (2021)

I am the author of liblfds, a portable, license-free, lock-free data structure library written in C.

I've been working on other projects for the last number of years, but I'm finally back to liblfds and making progress toward the next release.

https://www.liblfds.org/slblog/2023-04.html