Always surprising to click through a link on HN and discover it is one's own work. For a time I was very interested in lightweight array-based implementations of common data structures and this one seemed particularly handy.
It sounds a little like it didn’t work out as well as you hoped. How did it fare?
I am interested because I have some scientific software (astrodynamics - propagating orbits in space) that would really benefit from a cache-friendly tree, and this seemed promising.
This is the representation I usually see used for the tree of nodes for skinning 3D models. There each node has a transform, and the most common operation is to recompute the world transform for all nodes, formed by composing the transform for each node with the one for all of its parents. If the array is sorted so that parents always precede their children, that's just a straight loop over the arrays
for i = 0, num_nodes do
if parents[i] == -1 then
world_xforms[i] = xforms[i]
else
world_xforms[i] = world_xforms[parents[i]] * xforms[i]
Lots of comments about iterating over children being O(N) for this style of trees. It's actually easy to generalize the atree design by e.g. adding pointers for "first child" and "next sibling" and potentially removing the parent pointers, if that's what you need in your application. I think the "Operations in psuedocode" section should simply state that there's no O(1) way to access children of a node - instead of recommending the O(N) approach, you should recommend changing the data structure to support the operations you need.
Storing nodes in arrays and using indices for pointers is a must whenever you're implementing algorithms on trees. I typically prefer using an array of structs, putting the key and the parent index next to each other, instead of putting them in separate arrays. If you need to access the children of a node, then be sure to consider if you can save memory by having a separate structure for leaves - remember that over half of the nodes will be leaves, so using space to store a child pointer on both internal nodes and leaves can be wasteful use of memory.
One way to maintain leaves efficiently (amortised O(1) time and space to add/remove) is to keep 2 extra vectors that "point at each other": L[] is a vector containing the indices of all leaves, and LP[] is a vector such that LP[i] stores the index within L[] of the value i if i is a leaf, and -1 otherwise. That is, for i a non-leaf, LP[i] == -1, while for i a leaf, L[LP[i]] == i.
To find the indices of all leaves: That's just L[].
To add a leaf i: Append i to L[], and set LP[i] to L.length - 1.
I find it a bit odd how this tree representation as a parents array (which is, by the way, I think the most basic representation in any CS course), got so much traction on HN. I think this goes to show how far a good presentation can drive a trivial idea. On top of that, it just casually presents suboptimal procedures for a lot of essential operations, without diving into too much details about the impact of the suboptimality. Good PR I guess…
I think in this context, "pointer-free" is meant to imply (spatial) locality of reference, no additional memory allocations, address-independent and presumably memory safety (but I didn't read the code and may be wrong about the last one).
It's not exactly so right on the hardware level. Many architectures, including the ubiquitous x64, have an ability to add a direct offset to a memory-access operation (like mov), or efficiently compute the address with an offset and element size (like lea). This removes an extra pointer deteference, which may be hugely expensive.
With a cache-friendly structure like array, the one dereference you may need for the array access has a high chance to be served from L2 or L1, saving you a lot of clocks, because RAM has huge latency.
There are technically correct nitpicks with some merit and there are trivial remarks like yours. It is perfectly clear what "pointer-free" was referring to in the post title.
Looks nice. That said, if you want to reduce mallocs and encourage data locality, maybe you could try a more traditional tree implementation using a pool of nodes similar to thread pools. I've found that most of my problems related to those were solved by such techniques.
A custom pool for objects is often a good idea. Pre-allocate “enough” memory and dish them out as appropriate, releasing back to the pool when the reference count goes to zero.
You can also use the pool to bound the maximum size you’re able to allow.
One change I'd make to this structure would be to pack all the children of a node together. That way you could find out the list of all children of a node with a binary search, and they'd have some cache-friendly properties.
Inserting a child would need a memmove in O(N), but if edits are rare after an initial build it wouldn't be that bad.
You could also have extra space allocated for where extra data should go. By aggressively rebalancing you should only need to do a full reallocation when the tree is actually getting pretty full.
I was about to comment that this reminds me of Aaron Hsu’s Dyalog compiler, but it’s mentioned right there in the README as one of the inspirations.
IMHO, compilers are about an order of magnitude slower than they could be, and this type of tree representation could be one key element for fixing that.
One interesting approach would be to combine egg[1] with this style of vectorised trees for efficient optimisation passes.
What is a array index if not a “pointer”? In fact, access an array member involves a pointer arithmetic:
array_p + index * data_size
The fact that traditional tree implementation requires malloc is solely based on the wish to dynamically provide and remove of nodes. If the tree can have a limited size with no frequent delete operation, an array implementation is fine. Otherwise, expand an array can be a very costly or even an impossible operation in some circumstances.
A lot of people commenting about "an index into an array is also a pointer", I thought people commonly referred to integer indexes of this kind as handles, or index-handles? (like in this article [1]).
This way of representing trees reminds me of two classic data structures: heaps [2] and disjoint-sets [3].
But the parent indices are pointers. Not in the index-into-all-memory sense but in the offset-into-array sense. They’re smaller, type safe, inherently well packed, and can’t point outside the instance of the data structure in question if they’re bounds checked. But I would still think of them as a sort of pointer.
So this is just a tree, with only parent pointer, in SOA form, and iterable (in no particular order). Which is maybe useful if you want that specific data structure.
And it’s utterly useless if you want, say, a tree with children and need better-than-O(n) access to children. Which you probably do need. Of course, you can add child pointers in SOA form if you want.
I think you could achieve something close to (I haven't sat down and thought about it) O(logN) with sorted arrays? You'd have the downside of inserting things in the middle, but I think that would be lost in cache efficiency elsewhere for many use cases.
I feel like atree in TFA is just giving a name to something I do all the time when it makes sense to so. For example, I used something similar recently when building a ECS entity tree for a game engine.
Is pretty simple actually. What I have observed (and my tree exploit) is that most tree are "too much" and I only have cared by pre-order tree and "sequentially" scan is the major op.
So, the major takeaway is that if your case is simple, Go! Go! Vec!
Conceptually, yes. There are a lot of finer points to this concept though. One of the points most relevanthere is that accessing "RAM" is not _actually_ an O(1) operation, due to CPU caches which can speed up sequential access of "close together" data. Thus, if you manually ensure that data is "close together" (by putting it all in an array together), you can massively speed up your program in the real world because now all your keys/data can be squeezed into the cache at once.
Pointers are a bit more abstract than that. Since each program has a virtual address space, and said address space is made up of memory pages, even raw pointers need to be translated to the physical address via TLBs. In that sense, indices into arrays are more like pointers than the other way around.
I'm curious about the performance of separating the data from the pointers. On one hand, homogenous arrays pack better. On the other, the traditional version that packages the data and child pointers next to each other might have better cache locality.
I'm surprised nobody has pointed out what seems to be a fairy obvious limitation: the children within a parent cannot be ordered (unless you want to splice them in the middle of the arrays when inserting which would be pretty inefficient)
[+] [-] tlack|2 years ago|reply
[+] [-] spenczar5|2 years ago|reply
I am interested because I have some scientific software (astrodynamics - propagating orbits in space) that would really benefit from a cache-friendly tree, and this seemed promising.
[+] [-] 8372049|2 years ago|reply
[+] [-] edflsafoiewq|2 years ago|reply
[+] [-] ninepoints|2 years ago|reply
[+] [-] rav|2 years ago|reply
Storing nodes in arrays and using indices for pointers is a must whenever you're implementing algorithms on trees. I typically prefer using an array of structs, putting the key and the parent index next to each other, instead of putting them in separate arrays. If you need to access the children of a node, then be sure to consider if you can save memory by having a separate structure for leaves - remember that over half of the nodes will be leaves, so using space to store a child pointer on both internal nodes and leaves can be wasteful use of memory.
[+] [-] akoboldfrying|2 years ago|reply
To find the indices of all leaves: That's just L[].
To add a leaf i: Append i to L[], and set LP[i] to L.length - 1.
To remove a leaf i in constant time:
[+] [-] bicsi|2 years ago|reply
[+] [-] matt3210|2 years ago|reply
[+] [-] 8372049|2 years ago|reply
[+] [-] nine_k|2 years ago|reply
With a cache-friendly structure like array, the one dereference you may need for the array access has a high chance to be served from L2 or L1, saving you a lot of clocks, because RAM has huge latency.
[+] [-] naasking|2 years ago|reply
[+] [-] nilamo|2 years ago|reply
A more useful, storage/serializable friendly, pointer perhaps.
[+] [-] huhtenberg|2 years ago|reply
There are technically correct nitpicks with some merit and there are trivial remarks like yours. It is perfectly clear what "pointer-free" was referring to in the post title.
[+] [-] zeroCalories|2 years ago|reply
[+] [-] OnlyMortal|2 years ago|reply
You can also use the pool to bound the maximum size you’re able to allow.
[+] [-] icsa|2 years ago|reply
https://www.youtube.com/watch?v=hzPd3umu78g https://www.youtube.com/watch?v=X5_5MtOYNos
[+] [-] mihaic|2 years ago|reply
Inserting a child would need a memmove in O(N), but if edits are rare after an initial build it wouldn't be that bad.
[+] [-] snovv_crash|2 years ago|reply
[+] [-] jiggawatts|2 years ago|reply
IMHO, compilers are about an order of magnitude slower than they could be, and this type of tree representation could be one key element for fixing that.
One interesting approach would be to combine egg[1] with this style of vectorised trees for efficient optimisation passes.
[1] https://arxiv.org/pdf/2004.03082.pdf
[+] [-] sinuhe69|2 years ago|reply
The fact that traditional tree implementation requires malloc is solely based on the wish to dynamically provide and remove of nodes. If the tree can have a limited size with no frequent delete operation, an array implementation is fine. Otherwise, expand an array can be a very costly or even an impossible operation in some circumstances.
And full scanning is not efficient.
[+] [-] kleiba|2 years ago|reply
[+] [-] emmanueloga_|2 years ago|reply
This way of representing trees reminds me of two classic data structures: heaps [2] and disjoint-sets [3].
--
1: https://floooh.github.io/2018/06/17/handles-vs-pointers.html
2: https://en.wikipedia.org/wiki/Heap_(data_structure)
3: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
[+] [-] amluto|2 years ago|reply
But the parent indices are pointers. Not in the index-into-all-memory sense but in the offset-into-array sense. They’re smaller, type safe, inherently well packed, and can’t point outside the instance of the data structure in question if they’re bounds checked. But I would still think of them as a sort of pointer.
So this is just a tree, with only parent pointer, in SOA form, and iterable (in no particular order). Which is maybe useful if you want that specific data structure.
And it’s utterly useless if you want, say, a tree with children and need better-than-O(n) access to children. Which you probably do need. Of course, you can add child pointers in SOA form if you want.
(SOA is Struct Of Arrays)
[+] [-] alexchamberlain|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] catlifeonmars|2 years ago|reply
[+] [-] irreducible|2 years ago|reply
[+] [-] conradludgate|2 years ago|reply
[+] [-] readthenotes1|2 years ago|reply
[+] [-] mamcx|2 years ago|reply
I made my own attempt at the same kind of idea at https://elmalabarista.com/blog/2022-flat-tree/.
Is pretty simple actually. What I have observed (and my tree exploit) is that most tree are "too much" and I only have cared by pre-order tree and "sequentially" scan is the major op.
So, the major takeaway is that if your case is simple, Go! Go! Vec!
[+] [-] gumby|2 years ago|reply
[+] [-] lelandbatey|2 years ago|reply
[+] [-] senderista|2 years ago|reply
https://www.dropbox.com/s/mmi46lxk8ipoci0/Principles%20of%20...
[+] [-] mrkeen|2 years ago|reply
[+] [-] zappb|2 years ago|reply
[+] [-] stevage|2 years ago|reply
[+] [-] spenczar5|2 years ago|reply
Arthur Whitney is the origin of much of that family of languages. Bryan Cantrill’s interview with him is good: https://dl.acm.org/doi/pdf/10.1145/1515964.1531242
[+] [-] mgaunard|2 years ago|reply
K is the lower-level language and is a variant of APL. Q is the higher-level one, bringing in elements of SQL.
J is another APL-like programming language, unrelated to KDB, by the actual creator of APL.
All are easy to google by adding "programming language" to your query, and each has a wikipedia page.
[+] [-] laszlokorte|2 years ago|reply
[+] [-] ufo|2 years ago|reply
[+] [-] timrobinson333|2 years ago|reply
[+] [-] alpaca128|2 years ago|reply