top | item 22430457

(no title)

strags | 6 years ago

I recently needed to encode a 32-bit value into something easy for QA folks to remember and report. I opted for 3 words out of an 11-bit (2048 entry) dictionary of commonly used words.

How to build the dictionary? Well, in order to determine the most commonly used English words, I downloaded a bunch of free texts from Project Gutenberg, and did some simple filtering - nothing less than 5 letters, no duplication of singular + plural, etc...

A valuable lesson that I learned during this process is that when your corpus includes older english texts, you should always give your final list a visual once-over and apply some judicious manual filtering. I'm looking at you, "The Adventures of Tom Sawyer". (And, to a lesser extent, Moby Dick).

discuss

order

Dylan16807|6 years ago

In most cases if you need a short list it's better to use something like the diceware or EFF lists than to make your own from scratch.

cyphar|6 years ago

Or use the BIP39 lists since they also encode 2048 bits. If you just use BIP39 you also get a checksum. RFC 1751[1] is the "standardised" option but IMHO the wordlist they use is far too easy to misread (though this is because the words are all less than 4 characters).

[1]: https://tools.ietf.org/html/rfc1751