top | item 24390701

Hash Array Mapped Tries (HAMT) to the Rescue

65 points| adlrocha | 5 years ago |adlrocha.substack.com

31 comments

order
[+] omginternets|5 years ago|reply
Apart from the Steindorfer and Vinju paper [0], does anybody have any resources for implementing the CHAMP variant of HAMT tries?

Better yet, has anybody here implemented one before, and might I pick your brain?

[0] https://michael.steindorfer.name/publications/oopsla15.pdf

[+] ianopolous|5 years ago|reply
I've implemented CHAMP [0] for Peergos in the IPLD/IPFS setting, largely based on the Steindorfer paper. There is one improvement that whyrusleeping from ipfs came up with which is to allow a small number of hash collisions in a level, before pushing things down a level.

[0] https://github.com/Peergos/Peergos/blob/master/src/peergos/s...

[+] joshlemer|5 years ago|reply
I have done more than enough work on the CHAMP encoded hashmap in Scala 2.13 to call myself a co-author
[+] invisiblerobot|5 years ago|reply
Clojure has an implementation of these in java.
[+] huhnmonster|5 years ago|reply
Can someone explain the reasoning for using a HAMT here? I have played around with hashtables quite a bit, but altough HAMT's are cool and a lot more memory efficient, they are likely not faster and will at some point suffer from more severe memory fragmentation than hashtables.

Is the concern speed or memory? Because speed would kind of seem strange, since after changing to the new architecture, they should see a strong decrease in messages being exchanged anyways, right?

Or are go's maps inefficient? Genuinely curious

[+] fsloth|5 years ago|reply
HAMT is excellent for a persistent dictionary. Not persistent as in write-to-disk, but as in immutable-yet-small.
[+] masklinn|5 years ago|reply
HAMT are efficient for persistent data structures, compared to more usual trees.

They’re slower and less efficient than a mutable hashmap, but way more so if you need to keep the “old revisions” around.

[+] jsdavo|5 years ago|reply
> they are likely not faster

If they are cache friendly then they can be very fast. The papers by Phil Bagwell (may he rest in peace) are very rich. Lots of implementation detail. Check them out!

[+] sfink|5 years ago|reply
Maybe I'm just missing it, but I don't see in this description any explanation of how to handle hash collisions - ie, when two keys map to the same hash value. Do you iterate through an infinite series of secondary hash functions when you run out of distinguishing bits? Or do you bottom out at a table that you linearly scan?

Overall, it seems like the algorithm is "maintain a trie on the hash values, using an auxiliary bitmap and popcnt to determine the index within any trie node." Does that miss anything?