I'm not really a data structures guy, but I love anagrams. When i wrote the anagramica API, the simplest way that I could come up with a fast search was this:
- Take a word and sort its characters.
- Add it to a dictionary where the key is the sorted characters and the value is the word.
- If the sorted characters already exist in the dictionary then add it to the list of words for the same key.
This gives O(log n) when you give it a list of letters and you need to find all the possible words.
Since finding a set of possible moves given the current Scrabble board is not quite equal to finding anagrams. To make a move in Scrabble you need to find words that, for example, have both E and L characters which are separated by two other letters (i.e. /^.{0,}E..L.{0,}$/) when the board contains a row or column with the same pattern (E, followed by two spaces and then L). A simple anagram list cannot efficiently search for such patterns.
When you are building a Scrabble engine and need to construct the list of potential moves, GADDAGs come in handy as you are able to "cast" your search from subgraphs "anchored" at letters already on board.
I believe that there is no benefit in using a GADDAG. Finding an anagram is an easier problem as you know that you will use all the letters, so sorting them is the optimal solution. But this method does not work with scrabble since you would have (at least) to find all anagrams of all subsets, which adds an exponential step.
The node packing format he describes sounds a bit like a LOUDS tree [1], which stores the structure of a tree as a bit array (each node as a '1' for each child, plus a '0'---for a total of 2n-1 bits for a tree of n nodes), and the data in a separate packed array. It can't represent the node-deduplication (nodes with multiple parents), but I think it gives comparable compression: for the full word list of 3,213,156 nodes, the tree structure is 6,426,311 bits (0.76MB), plus 3,213,156 bytes of character data---for 3.83MB total.
The downside is that traversing the tree is a series of linear bit-counting operations---which can be painfully show without a bit of pre-caching.
What open source dictionaries are being used by these scrabble programs? (I know I could bing (or DDG) the answer but would like to hear from an "insider" if possible.)
This appears to be just reinvention of known algorithms on suffix trees. I recommend (and recommended as a comment on the original article) Dan Gusfield's book "Algorithms on Strings, Trees, and Sequences" which does a pretty thorough job of covering the relevant algorithms and data structures.
The GADDAG is a standard datastructure - the original paper, linked from the article, is from 1994.
The article doesn't pretend to invent the GADDAG, nor claim to compress it better than others, only to try and explain how to simplify and pack a GADDAG.
The steps would work on all DAGs generally. This is nothing new, but hopefully its new to some of us and a nice article.
[+] [-] binarymax|11 years ago|reply
What benefits do GADDAG offer over the above?
[+] [-] lifthrasiir|11 years ago|reply
[+] [-] nathell|11 years ago|reply
A sample implementation (in not very idiomatic Clojure, not touched in years, and using DAWGs instead of GADDAGs): https://github.com/nathell/spleen/blob/master/src/pl/danielj...
[+] [-] emillon|11 years ago|reply
[+] [-] dbaupp|11 years ago|reply
[+] [-] btn|11 years ago|reply
The downside is that traversing the tree is a series of linear bit-counting operations---which can be painfully show without a bit of pre-caching.
[1]: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/aladdin/wwwloca...
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] SeanDav|11 years ago|reply
[+] [-] baking|11 years ago|reply
http://www.freescrabbledictionary.com/twl06.txt
[+] [-] willvarfar|11 years ago|reply
[+] [-] recursive|11 years ago|reply
[+] [-] rickbradley|11 years ago|reply
[+] [-] willvarfar|11 years ago|reply
The article doesn't pretend to invent the GADDAG, nor claim to compress it better than others, only to try and explain how to simplify and pack a GADDAG.
The steps would work on all DAGs generally. This is nothing new, but hopefully its new to some of us and a nice article.