What advantage would a trie's prefix searching give you in a word count task? Worse, you're following word.length() pointers, potentially all over your memory. You could collapse with Patricia or Merkle or whatever, but they'll all be worse than the one pointer lookup of a hash table (modulo collision chasing if you pick a bad hash/size) & all in the service of giving you a feature (prefix lookup) you don't need.
The comments explain why I think it's slower. The TL;DR is that using a trie requires more memory accesses than a hash table (per byte of input), and those memory accesses slow the whole enterprise down.
But, not all memory accesses are created equal. So perhaps a more clever representation that exploits locality better would do better. "Obvious" techniques for shrinking memory usage made a big difference and brought the performance of the trie program closer to C: https://github.com/benhoyt/countwords/pull/2
throw_away|5 years ago
burntsushi|5 years ago
The comments explain why I think it's slower. The TL;DR is that using a trie requires more memory accesses than a hash table (per byte of input), and those memory accesses slow the whole enterprise down.
But, not all memory accesses are created equal. So perhaps a more clever representation that exploits locality better would do better. "Obvious" techniques for shrinking memory usage made a big difference and brought the performance of the trie program closer to C: https://github.com/benhoyt/countwords/pull/2
indweller|5 years ago