top | item 26468470

(no title)

indweller | 5 years ago

Can someone explain why trie is not faster than hash table?

discuss

What advantage would a trie's prefix searching give you in a word count task? Worse, you're following word.length() pointers, potentially all over your memory. You could collapse with Patricia or Merkle or whatever, but they'll all be worse than the one pointer lookup of a hash table (modulo collision chasing if you pick a bad hash/size) & all in the service of giving you a feature (prefix lookup) you don't need.

burntsushi|5 years ago

The source code of the trie program I wrote is here: https://github.com/benhoyt/countwords/blob/master/rust/optim...

The comments explain why I think it's slower. The TL;DR is that using a trie requires more memory accesses than a hash table (per byte of input), and those memory accesses slow the whole enterprise down.

But, not all memory accesses are created equal. So perhaps a more clever representation that exploits locality better would do better. "Obvious" techniques for shrinking memory usage made a big difference and brought the performance of the trie program closer to C: https://github.com/benhoyt/countwords/pull/2

indweller|5 years ago

Yeah, I get it now. Thanks!