top | item 43680100

(no title)

steventhedev | 10 months ago

This feels like a net negative result. It removes some of the complexity of using generics, but it couples between the data type and the collections it can be indexed by.

What are the benefits of this approach? Is it limited to data alignment, or is it out of a greater desire to remove generics?

discuss

order

whizzter|10 months ago

Intrusive linked lists as this is called removes unnecessary allocations and traversal (main reason why linked lists have such a horrible reputation), say a hypothetical example where you have a temporary listener object installed that listens to broadcasts on channel X and times out after 5 minutes.

Upon connecting the object is connected to the channel X linked list as well as some other list of objects that are killed at the timestamp 5 minutes in the future.

With an intrusive linked list the link-node resides within the object, the only thing needed when installing is linking it into the 2 lists (this is a few move-operations), an external linked list would _require 2 extra allocations_ for the linked-list nodes.

Broadcasting from X is almost the same since it's a simple traversal, albeit with double the cache pressure since the object and the linked-list node probably lives separately.

The real HORROR comes when disconnecting, if channel X has millions of listeners it could become a real bottleneck to traverse the linked list to find the link-node that connects the actual object since there could be tons of jumping around memory. An intrusive linked list would just disconnect itself if it's doubly-linked.

This is why hashsets/tables,vectors/arraylists,etc are often used in practice (because many "modern" OO languages never added good support for the machinery needed here) making linked lists look quite bad (there is other badness but using non-intrusive linked lists is almost always worse than using something else than a linked list altogether).

steventhedev|10 months ago

The generic version in TFA puts the data type allocated alongside the next pointer - no additional allocation needed. The only functional difference is if the zig compiler is not sufficiently advanced to understand it can reorder the fields (hence the alignment question).

The removal scenario is merely specifying that you are passing in ConnectionListNode instead of a Connection. Although maybe it's a good idea to think about how they compose comparatively.

flohofwoe|10 months ago

It's not at all unusual for intrusive linked lists though?

On AmigaOS (which is entirely built out of intrusive doubly linked list) the list node is placed at the start of a struct, so the pointer to the node is also the pointer to the linked item. There's also no 'coupling' because the list manipulation functions take pointers to the list node structs, but they don't need to know the 'outer' item struct.

Zig's @fieldParentPtr is more flexible since the node struct can be anywhere in the parent struct and you still can get from the node pointer to the item base pointer (and more importantly, it makes it trivial to link the same item into multiple linked lists).

Joker_vD|10 months ago

It's not unusual at all, it's just... should it be exposed? I personally prefer having "Node struct with next pointer + inlined generic data" design when it can be encapsulated away since, well, it can be encapsulated away, and the result data layout is the same. And when you expose this design, well, you end with all sorts of disasters, including OOP inheritance (no, really: [0]).

[0] https://catern.com/inheritance.html

codr7|10 months ago

This approach is also common in C:

https://github.com/codr7/hacktical-c/tree/main/list

One pretty big advantage is you get some help from the compiler making sure you mean what you say. As opposed to just blindly casting items and assuming no one forgot to put the list in their struct. Also the possibility to be part of multiple lists.

GolDDranks|10 months ago

I think it matches really well for the Zig ethos of "artisanal, minimal code". More than often, a type is strongly tied to how it is being used in the code base. In that sense, having it be tightly coupled to a data structure isn't much of a problem. The opposite isn't true, and the data structure is independent of the embedding parent data. The implementation of that data structure itself can still be presented as a library, and the "generic" parts carefully tested.

reissbaker|10 months ago

Zig has no desire to remove comptime AFAIK (comptime being the larger Zig feature by which people implement generics in the language) — comptime is pretty much the reason to use Zig.

The benefits of intrusive linked lists are higher performance; you can use comptime to still have decoupled code.

surajrmal|10 months ago

The generic approach should be the same performance. This approach just lets you place your data in multiple lists without needing multiple allocations.

Someone|10 months ago

The only logical explanation I can see is that these are two decisions:

- linked lists aren’t useful on modern systems because traversing them causes to many cache misses. Therefore, we shouldn’t provide such a simple implementation.

- but we need one in low level OS code, and there, intrusive lists are preferred. Their limitation that you cannot store an object in multiple lists isn’t a problem, and the extra speed and the fact that you can move items between lists without allocations is desired.

I don’t see why the intrusive implementation has to be so basic, though. Can’t you, in Zig, express that a generic type T used in a generic list class has to have a nullable next field that points to a T?

throwawaymaths|10 months ago

> linked lists aren’t useful on modern systems because traversing them causes to many cache misses

Only if you "default to using" (and only use) malloc. Zig encourages you to use different allocators within the same program for different use cases, including, potentially allocators which will thrash the cache far less (for example thread local arenas).

yccs27|10 months ago

You could create a comptime function that adds the Node field to any type. I guess then you've arrived back at the previous generic version.