top | item 6380473

The Trie: A Neglected Data Structure

99 points| bbeneschott | 12 years ago |toptal.com | reply

82 comments

order
[+] tmoertel|12 years ago|reply
The original article uses Java, which makes it harder to see how simple it can be to make and use prefix trees. For comparison, here's the Python code to create a prefix tree from a set of words:

    def prefix_tree(words):
        d = dict()
        for word in words:
            root = d
            for c in word:
                root = root.setdefault(c, {})
            root['$$'] = {}  # mark end of word with '$$' token
        return d
For example:

    >>> prefix_tree('foo bar baz'.split())
    {'b': {'a': {'r': {'$$': {}},
                 'z': {'$$': {}}}},
     'f': {'o': {'o': {'$$': {}}}}}
To query whether a word is present in a tree you basically compute

    reduce(dict.__getitem__, word, tree)
and test whether the result (a forest) has an end-of-word '$$' tree.
[+] leif|12 years ago|reply
Tries (actually pronounced "trees" or "Patricia trees") are very memory efficient compared to most alternatives, but really only shine in membership query workloads. If you actually need to return the stored element, you have to rebuild it for every query, which is probably too expensive if you have lots of queries, it would be better to store the whole object continuously somewhere in the data structure so you could return a reference to it.

Even then, tries only win if the distribution is suitable to give you the memory efficiency you are hoping for. If you don't have many common prefixes, you're better off with just a hashtable.

[+] nly|12 years ago|reply
If your key type is lexically sorted, you can also do prefix searches in a binary search tree.
[+] acbellini|12 years ago|reply
I have reported the "try" pronunciation because that's the way I have learnt them, and also because that's how Sedgewick's book explicitly uses, but I guess there are alternatives :-)

Anyway, you just got right the point that I was trying to make, a trie is definitely not a general purpouse data structure, but for the specific case (limited number of elements, many common prefixes etc) it performs very well. As for the point of returning the original object, it suffices to store the original object in end-word nodes. I didn't do that in this code because it would have added unnecessary code, since the problem was really just membership and prefixes.

In the specific case, hashtable wouldn't have worked, since it's not sorted and it doesn't solve the original problem.

[+] chatman|12 years ago|reply
For set membership checks, bloom filters are way more memory efficient. And almost accurate. Lots of bloom filters of different sizes holding the same data can bring down the false positives to negligible.
[+] zheng|12 years ago|reply
Thanks, while I think the pronunciation is lame and confusing, I grow tired of explaining to people that it is actually correct, they aren't called "trys" or "treys".
[+] AaronFriel|12 years ago|reply
You're only looking at one kind of trie, there are also Judy arrays, which have much better performance characteristics and (if we're abusing the notation) the same worst-case memory complexity.
[+] chatman|12 years ago|reply
A trie is a poor data structure in practice. There is no locality of reference that can be exploited: to lookup a single key, almost the entire length of the memory blocks occupied by the trie needs to be accessed, and almost at random (depending on the insertion technique). For keys inserted later on, the number of different blocks of memory accesses needed for a single key is proportional to the length of the key.

A B-Tree is better due to better exploitation of locality of reference in memory.

[+] judofyr|12 years ago|reply
See also: HAT-trie by Askitis and Sinha.

Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components.

[+] fogleman|12 years ago|reply
Should mention that tries can be heavily compressed into DAWGs (Directed Acyclic Word Graphs) by eliminating common subtrees. I've used this in word games on iOS where I want a small memory footprint.

P.S. I love tries!

[+] tieTYT|12 years ago|reply
I think the article was well written. This is the standard example to use for a Trie. But I never run into any scenarios except for this that make me think, "A Trie is a good fit for this!" What are some other examples where Tries come in handy?
[+] tmoertel|12 years ago|reply
I've used prefix trees to represent matching rules for network-routing rulesets. They're also handy when you need to compute intersections because you can perform a parallel depth-first search over two prefix trees and prune from the search any nodes that don't have a corresponding node in the other tree.

For a fun example of this last application, there was a recent Google Code Jam problem called "Alien Languages" [1]. My solution [2] basically counts the leaf nodes in the intersection of two prefix trees. (Note that we can compute the count on the fly during the search and need not actually construct the intersection.)

[1] http://code.google.com/codejam/contest/90101/dashboard#s=p0

[2] https://github.com/tmoertel/practice/blob/master/google-code...

[+] zb|12 years ago|reply
When I was developing firmware for Layer 3 network switches we made very heavy use of Patricia Tries (it was originally pronounced like 'tree', BTW... stupid name) for the management of routing tables. They're a good fit because you typically have comparatively long common prefices, especially within your own network.
[+] jdmichal|12 years ago|reply
I think the alternate name for tries, radix trees, provides a good hint to potential use-cases. Any problem where the data can be broken down and searched by radix would be a potential good fit. You could even break down data structures with a lot of limited-enumeration fields in a similar way, using a different field at each depth, and gaining deduplication for each node which follows the same path through the structure.
[+] zem|12 years ago|reply
word games are definitely the big one [a dawg, which is a packed form of a trie, is even better; it trades off space efficiency for being 'final' in that you cannot modify it once packed. http://www.wutka.com/dawg.html].

but i've used a trie once to good effect to replace a red-black tree with a depth-3 trie whose leaves were red-black trees, trading off the increased space usage for the ability to update the rbtree entries in parallel.

[+] supo|12 years ago|reply
With a basic Trie, you always have this problem of how to keep the pointers at each node. Naively, you can:

1) Keep an array of size equal to the alphabet size, indexed by the characters, holding pointers to successive nodes.

- but this blows the memory to O(N * alphabet_size) where N is the input size

2) Keep a linked list of pairs (character, pointer).

- but this blows the lookup time to O(word_length * alphabet_size)

3) Keep something like an STL map<character, pointer>.

- still suboptimal complexity of O(word_length * log alphabet_size) + it complicates the code + it dramatically increases the constant time per operation

Or you research a bit more and actually learn a way to properly implement a trie, called a Ternary Search Tree: http://en.wikipedia.org/wiki/Ternary_search_tree

[+] anaphor|12 years ago|reply
Or you decide that your problem is small enough that the poor memory complexity doesn't matter much.
[+] dnr|12 years ago|reply
If you like tries, take a look at the double-array trie: it's a pretty nifty (though somewhat complicated) way to greatly reduce the space usage of tries by interleaving nodes in the same block of memory. It also makes it easier to serialize a trie, so you can quickly load a pre-built one instead of re-doing all the inserts, which is useful for static dictionaries.

http://linux.thai.net/~thep/datrie/

[+] GhotiFish|12 years ago|reply
Thanks for the link! I wish I'd known about this implementation.
[+] munificent|12 years ago|reply
Tries haven't been entirely neglected. Quadtrees and octrees are their 2- and 3D analogues, and those are very common in games, renderers and other spatial simulations.
[+] Scaevolus|12 years ago|reply
Quadtrees and octrees are far more related to binary space partitioning than tries.
[+] pathikrit|12 years ago|reply
[DAWGs](http://en.wikipedia.org/wiki/Directed_acyclic_word_graph) are a special kind of Trie where similar child trees are compressed into single parents. I extended modified DAWGs and came up with a nifty data structure called ASSDAWG (Anagram Search Sorted DAWG). The way this works is whenever a string is inserted into the DAWG, it is bucket-sorted first and then inserted and the leaf nodes hold an additional number indicating which permutations are valid if we reach that leaf node from root. This has 2 nifty advantages:

Since I sort the strings before insertion and since DAWGs naturally collapse similar sub trees, I get high level of compression (e.g. "eat", "ate", "tea" all become 1 path a-e-t with a list of numbers at the leaf node indicating which permutations of a-e-t are valid). Searching for anagrams of a given string is super fast and trivial now as a path from root to leaf holds all the valid anagrams of that path at the leaf node using permutation-numbers.

[+] anaphor|12 years ago|reply
I just used a Trie yesterday to implement some string matching that I needed to do in a tokenizer :) It's quite useful if you want to do string matching and you know ahead of time the exact list of strings you need to match with. My only issue with this article is that it explains them using Java, which I think just obscures the explanation too much, but I guess it's a Java blog.
[+] ArbitraryLimits|12 years ago|reply
Some people say to use a perfect hash function in that case, IMO it's more trouble than it's worth.
[+] nly|12 years ago|reply
Did you need full prefix and/or suffix string matching?
[+] tehwalrus|12 years ago|reply
The Dasher input method (aimed at people with disabilities who are able to enter data only by manipulating a pointer) uses a visualized trie with vertex areas proportional to linguistic probability:

http://www.inference.phy.cam.ac.uk/dasher/DasherSummary2.htm...

(Prof. Mackay, its author, uses it to teach information theory and compression, among other things.)

[+] andrewparker|12 years ago|reply
Implementing the Trie (on paper, in C) was about 50% of my final exam on my intro to CS undergrad class at Stanford. We had never seen Tries befor that point, but we had done a variety of trees, so we had the necessary prior experience. It was a fun exam and a very memorable experience.
[+] supo|12 years ago|reply
You went to Stanford and have never seen a Trie in highschool? There are a few highschools in Bratislava, Slovakia that teach basic data structures in their Intro to programming courses.
[+] RomanA|12 years ago|reply
I had to do the same (on paper, in C) for my final exam in CS1 at my Uni. Our professor had prepped us on how to implement Trie (insertion, searches, memory complexity), though.
[+] sambeau|12 years ago|reply
Tries are great for caching URLs & other path-like data. I've found them to be faster (and smaller) than hashes for these purposes in the past.

However, this was back in the days of tiny caches. I suspect they may have lost the edge they had in 2003.

[+] sspiff|12 years ago|reply
This contradicts chatman's comment at the top, who claims a trie has poor locality. Since you have some experience with tries, do you have an explanation on how to implement tries with good locality properties?
[+] nawitus|12 years ago|reply
> ~O(1) access time

What's the difference between O(1) and ~O(1)?

[+] fournm|12 years ago|reply
I've seen ~O(1) as amortized constant time fairly frequently.
[+] mhurron|12 years ago|reply
I've always seen ~ as used for about, so ~1h is somewhere around one hour.

I would read it as close to, but maybe not quite O(1).

[+] MarcusBrutus|12 years ago|reply
the tilde notation is used for functions, e.g. you can say f(n) is ~g(n) and in that context has a rigorous meaning (limit of f(n)/g(n) is 1 as n goes to positive infinity). I haven't seen it being used with orders of growth notations (O, Θ or Ω) and my guess is that's ~O(1) is undefined or somebody's just taking liberties with the notations.
[+] ape4|12 years ago|reply
This blog post appears to be to get the author's resume looked at. Seems to be working.
[+] macNchz|12 years ago|reply
17 years of both Java and JavaScript, now ain't that something...