Did you consider using the gTLD zone files (from the respective registries) and the ccTLD zone files found @ http://viewdns.info/data/? A much bigger initial dataset than 25M domains right there?
The Sonar FDNS set contains about 1.4 billion host names (50m+ domains). The FDNS set is seeded from TLD zones, CZDAP, PTR lookups (RDNS), SSL/TLS scans, and HTTP link extraction. It updates every two weeks: https://github.com/rapid7/sonar/wiki/Forward-DNS
jtwaleson|10 years ago
773733
0.77M of Alexa top 1M were not in my list.
$ cat alexa alexa myset | sort | uniq -u | wc -l
25842205
I mined 25,842,205 additional domain names.
howaboutit|10 years ago
yazriel|10 years ago
(and yes.. now i see that you mentioned it in the article.. took me time to get there)
hdmoore|10 years ago