top | item 33477604

(no title)

raible | 3 years ago

FWIW I usefully use a "fake" linked list by adding a .next field to the struct in a compiler-allocated array of structs. On initialization I set the list head to &item[0], and in a trivial loop set the .next field of each entry (except the last) to entry+1.

Why bother? Because I can then easily add a few extra structs to the beginning of the (contiguously-allocated) linked-list without having to reallocate the whole thing.

Sure, pointer chasing with separately allocated structs is "slow", but I haven't yet measured to see if it's any different when (almost all) items are contiguous.

If you would... - what sort of cache behavior should one expect of this on a modern laptop CPU? - I haven't seen this approach before, have you?

discuss

Sirened|3 years ago

It depends on what prefetchers your CPU has and the actual underlying access pattern your accesses cause. If chasing next leads to a constant stride run through the array, you'll get identical performance to that of an iterative walk since essentially every high performance CPU since like 2010 supports stride prediction based on raw addresses. If .next sends you bouncing all over the area, you'll get complicated behavior since whether or not the data can be prefetched depends on the performance/existence of a pointer prefetcher, which is less common/more error prone. We know Apple's M1 have it due to some researchers using it as a side channel [1] but you'll have to do some digging on whether or not your laptop has one. Would make a nice post here if you do make the benchmarks :)

[1] https://www.prefetchers.info/augury.pdf

raible|3 years ago

Thanks for that, it's what I had hoped for (but again, have not yet measured).

It seems to me it's a super-handy way of "modifying" a compiler-allocated array of structs. I'm sticking with it!