top | item 14530250

Show HN: 5+ Billion Passwords in Order of Most Popular

184 points| berzerk0 | 8 years ago |github.com | reply

52 comments

order
[+] lucasgonze|8 years ago|reply
This list is immediately useful for validating user-created new passwords. Just stop with the bizarre rules about having uppercase, lowercase, symbols, numbers, length, etc. Instead require a string not in the top 10K (or 100K, or 1M) most popular.
[+] microwavecamera|8 years ago|reply
"Your password must be between 12 to 46 characters long and must include at least one number, upper case character, special character, kanji character, rune and quadratic equation"
[+] bradleyjg|8 years ago|reply
Since the list is only rarely updated, this seems a perfect application Botelho's for minimal perfect hashing algorithm[1]. At about 8 bits per item storage and constant lookup it would be quite practical to use the top 32 Million list (appeared at least 10 times).

[1] http://cmph.sourceforge.net/papers/tr06.pdf

[+] expertentipp|8 years ago|reply
Note that this password list is valid rather for English speaking world.
[+] gfody|8 years ago|reply
you should package this up as a bloom filter and simple js routine web developers could use to do client-side checks to validate passwords.

edit: on 2nd thought looks like a bloom filter for 5B entries at p=0.01 would be ~5GB, so not exactly convenient

[+] berzerk0|8 years ago|reply
The largest list is 20GB, but it's not the only one.

Popularity was based on how many they appeared in files that had all duplicates removed (in reference to themselves)

The smallest file had passwords that appeared 75+ times, and the largest file had passwords that appeared 2+ times.

The top 195 Thousand (which appeared 25+ times in analysis) clocks in at 803kb as a text file with nothing but the passwords themselves

[+] surement|8 years ago|reply
Password validation is stupid and annoying. At best this should be used to display a warning that can be ignored.
[+] Roritharr|8 years ago|reply
Somehow I wonder if life is significantly different for people named Daniel.
[+] fasteo|8 years ago|reply
Or Alexander or Victoria. It's weird that these first names appear in the Top196-probable.txt file.
[+] berzerk0|8 years ago|reply
Having issues with the Seedbox, torrents are down temporarily.

The main page contains links to Mega.NZ alternative downloads. Will be fixed shortly, apologies for the inconvenience.

[+] kaslai|8 years ago|reply
Is there really any reason to multiply the downloads by 4 just to have different compression methods? It seems to me like that just needlessly dilutes the seed swarm and wastes space, since pretty much any modern archive reader can unpack all the provided formats. Sure, specialized command-line utils can't, but if you're using one of those then you probably know which one to use for a given format.
[+] infinisil|8 years ago|reply
Have you considered using IPFS for distribution?
[+] berzerk0|8 years ago|reply
Torrents are back! Mostly.

Two out of the 12 aren't, but every wordlist can be downloaded via torrent in at least 3 compressed formats, including .7z

[+] jacquesm|8 years ago|reply
I don't understand: Why do you assume that password checkers keep their lists in alphabetically sorted form rather than just to load the whole thing into a db table with an index on it?
[+] berzerk0|8 years ago|reply
As I was looking around for the files to make this project, on SecLists, Weakpass, and Hashes.org, most of the files were in alphabetical order. This was especially true for the larger files.
[+] smaili|8 years ago|reply
Not to sound negative, but is this something we should really be exposing? It feels like the only ones who gain from this are hackers and password crackers, no?
[+] microwavecamera|8 years ago|reply
Like others have mentioned, the criminal hackers are already up on this stuff. It's more beneficial to the security community as a whole to expose these things so you know how to implement effective security policies. When your trying to convince management to implement something security related it's one thing to say "We think this is a threat" verses saying "hey watch this" and demonstrating the threat.
[+] mvdwoord|8 years ago|reply
Audits. Education. Change.

It's not like bad guys don't already have wordlists, or that they are new in any way. Having good ones available, in the open, for everyone to use, provides a net benefit in the long run imo.

[+] qeternity|8 years ago|reply
It's already exposed. That's the point. The adversary has this intel. It would be irresponsible not to study and publish it, thereby reducing their informational edge.
[+] ignoramous|8 years ago|reply
To quote a certain Alfred Charles Hobbs: "Rogues are very keen in their profession, and know already much more than we can teach them."

Also see: Security through Obscurity.

[+] macscam|8 years ago|reply
Wow this is so useful for um security
[+] bbcbasic|8 years ago|reply
If you pick the 5 billionth one you've probably picked a gooden.
[+] dang|8 years ago|reply
We've banned this account for abusing the site, including posting many unsubstantive comments after we asked you to stop. Please don't create accounts to break the site rules with.
[+] Mz|8 years ago|reply
•These lists are for LAWFUL, ETHICAL AND EDUCATIONAL PURPOSES ONLY.

Yeah, like that is going to stop people from doing nefarious things with this info. If you feel the need to post this screechy, all caps disclaimer, maybe rethink your project entirely?

Geez.