top | item 9024751

I Am Releasing Ten Million Passwords

594 points| m8urn | 11 years ago |xato.net | reply

216 comments

order
[+] tptacek|11 years ago|reply
Barrett Brown was not convicted merely for linking to data on the web. He was convicted for three separate offenses:

1. Acting as a go-between for (presumably Jeremy Hammond) the Stratfor hacker and Stratfor itself, Brown misled Stratfor in order to throw the scent off Hammond. Having intimate knowledge of a crime doesn't make one automatically liable for that crime, but does put them in a precarious legal position if they do anything to assist the perpetrators.

2. During the execution of a search warrant, Brown helped hide a laptop. Early in the trial, in advancing the legal theory that hiding evidence is permissible so long as that evidence remains theoretically findable in the scope of the search warrant, Brown admitted to doing exactly that, and that's a crime for the same reason that it's a crime when big companies delete email after being subpoenaed.

3. Brown threatened a named FBI agent and that agent's children on Twitter and in Youtube videos.

The offense tied to Brown's "linking" was dismissed.

Brown's sentence was unjust, but it wasn't unjust because he was wrongly convicted by a trigger-happy DOJ; rather, he got an outlandish sentence because he managed to stipulate a huge dollar figure for the economic damage caused by the Stratfor hack, which he became a party to when he helped Hammond.

[+] jbapple|11 years ago|reply
What were the threats against the agent and the agent's children? I'm asking because I read some of them ("ruin his life", "look into" his kids), but I'm not sure which of those are protected under the First Amendment.

Broad categories of rude speech are protected under the First Amendment, including things like, IIRC:

1. Saying if President Johnson makes you pick up a gun, he'll be the first in your rifle sight. (Watts v. United States)

2. Telling a cop "I'll kill you, you white devil" while you are in handcuffs and unable to kill him. (? v. ?)

3. Swearing "revengeance" upon the Jews. (Brandenburg v. Ohio)

[+] downandout|11 years ago|reply
>The offense tied to Brown's "linking" was dismissed

This masks the scary reality that someone was indicted, arrested, and prosecuted for posting a link (not to mention that it was dismissed as part of a plea - not for lack of legal merit). While in this case there were other charges as well, there didn't have to be - all of the same pre-trial horrors (including possible detention without bail) could have occurred with only that charge. The fact that such a charge may eventually be dismissed/beaten at trial after your life is burnt to the ground for posting a link is little comfort.

[+] Slartibreakfast|11 years ago|reply
I don't know, sounds like he got off pretty lightly considering he threatened an FBI agent's children. I would expect the jail time would be a lot higher, but I guess I don't know what guides the court's decisions in these kinds of cases. I suppose five is enough time for him to figure out the error of his ways.
[+] egocodedinsol|11 years ago|reply
but what do you think about the big picture?

I don't know much of the specifics about Brown, but I think the wider point is worth discussing, especially with respect to the proposed change in legislation.

[+] sarciszewski|11 years ago|reply
> Barrett Brown was not convicted merely for linking to data on the web.

From the article:

     Most of us expected that those charges would be dropped and some were, although they still influenced his sentence.
I want to be generous and say that the author meant what you said. The linking was not something Brown was charged with, but it was brought up during the sentencing and probably influenced the length of his prison sentence.

So while you're correct that Brown was not charged with linking to information, it's worth noting that this was still used against him anyway.

Also, people who think the linking to hacked data was the only thing that got him arrested are being disingenuous (or are simply ignorant).

[+] higherpurpose|11 years ago|reply
It's interesting that you say his sentence was "unjust" given that you always seem to defend crazy sentences as "not being the real ones anyway".

Also those three sound like incredibly weak charges, and yet you somehow defend the prosecution over them.

[+] LeoPanthera|11 years ago|reply
Fun!

    $ export LC_ALL='C'
    $ awk '{ print $2 }' 10-million-combos.txt | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | head -n 20
    55893 123456
    20785 password
    13582 12345678
    13230 qwerty
    11696 123456789
    10938 12345
    6432 1234
    5682 111111
    4796 1234567
    4191 dragon
    3845 123123
    3734 baseball
    3664 abc123
    3655 football
    3330 monkey
    3206 letmein
    3136 shadow
    3126 master
    3050 696969
    3002 michael
Edit: I used Wordle[1] to make a wordcloud of the top 1000 passwords: http://i.imgur.com/FImcPiG.png

[1]: http://www.wordle.net

[+] meowface|11 years ago|reply
I don't understand exactly why it's necessary to release usernames along with the passwords, or why it's ethical to do so. Stripping the domain portion of email addresses does absolutely nothing when you can find the real email, and other accounts of the victim, by Googling the unique part of the email address.

How does tying each password to its corresponding username help with password research, and does the value gained outweigh the cost of someone using this list for malicious purposes?

I'm not saying this should be illegal, but I'm struggling to understand the intent here.

[+] a3_nm|11 years ago|reply
What about research to determine to what extent usernames with words in a certain language will tend to use passwords with words for the same language? (More generally, is there any connection between the bi- or trigram distribution on usernames and the one on passwords? In fact, do they just look the same, or could you tell given a string whether it's more likely a username or a password?)

Do usernames of people with weaker passwords have something in common? How do they differ from people with stronger passwords? In France there is a practice of picking names like "foobar42" or "foobardu42", where "foobar" is a first name and 42 a "département" (country subdivision) number, which I would associate to casual users. Here I could quantify whether people with usernames of this form tend to pick weaker passwords. Insert your favorite prejudice here about lame and skilled username patterns, and quantify how the password diversity of this group fares in comparison with others.

Is it true that the most common passwords were associated to usernames that were also common? Does username frequency correlate with password frequency? Are there more people with unique usernames or people with unique passwords?

In some countries it is customary to annotate usernames with the user's year of birth. Filtering on such usernames could give insight about the correlation between age and password quality, or identify which passwords are more or less popular given the user age. You could try to check correctness of the filter using the fact that some of those people may have used their birthdate (including the year) as a password.

If a seemingly rare password in the dataset only occurs for two distinct user names, then maybe those two user names actually correspond to the same user. Do such usernames have a low edit distance? Could you use this to learn general rules to determine, given two usernames, whether they seem to correspond to the same person?

I just gave those off the top of my head, and I'm not at all working in this field, but I'd have no trouble imagining interesting applications for this data that would not have been possible with the passwords alone.

[+] jMyles|11 years ago|reply
A list of 10 million passwords alone answers almost no questions. In fact, it's probably possible to programmatically predict, with a depressing level of accuracy, what a great deal of such a list will look like, given the already available research about the distribution of complexity, the parts of speech and numbers commonly used and in what patterns, etc.

So, the next interesting question is: given the already plaintext-available lists of usernames and passwords, just how much coverage is there in the known space? Are your passwords known? Are your users' and clients' passwords known?

This document is perfect for a true positive on the matter of needing to deprecate particular combinations of username and password, and, as an obvious corollary, presenting evidence for consultation advice about the same. (Of course, being only a sample, it doesn't say anything about a true negative.)

[+] yalogin|11 years ago|reply
Before I go into the research aspect of it, there is no reason to hide the usernames from the passwords. They are already out there. The bad guys have them. So why not release them so that every one can look at them?

Also I am sure there are some research aspects to the usernames. At the very least behavioral deductions that can be drawn based on these combinations.

[+] detaro|11 years ago|reply
Probably to find out how many people do stuff like type their username backwards as a password/what kind of patterns they use. If that is useful enough information to warrant publishing data like this is debatable, yes.
[+] diminoten|11 years ago|reply
I dunno if he should have said "released", because he's not releasing any new data. Everything he's posted is already available to anyone with a search engine and a bit of curiosity.

So if you're concerned that information which wasn't previously public is now public, you can be at ease -- all of this data was not only public already, but less "cleaned up".

[+] hasenj|11 years ago|reply
I'm curios to see if any of my accounts/passwords have been compromised
[+] presumeaway|11 years ago|reply
> I'm struggling to understand the intent here.

A desire for a particular type of attention his ego seems to need.

Which, combined with either a moronic lack of appreciation for the hassle and damage he's going to cause to end-users who've already been hosed once before, or an arrogance that makes him not care, makes him difficult to fit for a white hat.

FTA:

> This is completely absurd that I have to write an entire article justifying the release of this data out of fear of prosecution

What's absurd is his assumption that stripping domain names is somehow sufficient.

Edit: I'm getting downvoted like crazy here. Which is fine, but people seem to think it's ad hominem because I'm narrowing the reasons behind why someone would release a data set with a considerable price of collateral damage attached to it, while doing very little to mitigate that damage.

Just because the likely options for why someone would do such a thing don't speak favorably of the person, doesn't make it ad hominem. An ad hominem attack is seeking to undermine someone's argument by attacking their character.

I'm saying Mark Burnett made it difficult to assume good things about him after a stunt like that. If he actually made a real argument that what he did was sufficient, or that the harm he's going to cause is more than offset by the greater good it'll do (or some such argument), then we'd have something to try to undermine (whether legitimately or fallaciously), but as it stands, he hasn't even justified his actions.

[+] zaroth|11 years ago|reply
There is an annual 'Passwords' conference [1], which I attended in 2012, and was blown away by quite how much researchers are able to do with these password lists.

Unfortunately, I was equally impressed with what attackers are able to do with them as well. An important point is that attackers tend to have better lists, because they are the ones stealing and cracking them, and these lists make them increasingly better at cracking passwords. Defenders use the lists for all sorts of analysis on how exactly users pick passwords.

For example, "complex password policies" have become increasingly popular. But do they actually increase the entropy of the chosen passwords? Surprisingly little, since users will "defeat" the policy by applying easy to guess "munging rules". Humans being human and such. The thieves have the lists, and learn to apply the munging rules and defeat the policies. Researchers need these lists so they can discover the same weakness and try to react.

More recent research looks at things like how effective the password strength indicators are at actually helping users choose stronger passwords. We also learn about how users choose different strength passwords based on the sites they visit and such. This is absolutely fertile ground for research which can improve how we perform authentication.

Yet another good use of the lists is in defending against online attacks. E.g. Failed attempts that follow the general probability distribution of the lists are easier to identify as bots.

[1] - I think all the talks are posted, although I'm not sure there's a central archive, each conference is identified as Passwords^[Year], e.g. Passwords^14 https://passwordscon.org/

[+] meric|11 years ago|reply
These lists were released by attackers in the first place. Attackers are always going to have the lists, and the only choice defenders can take is whether to use and distribute to the defender community, or not.
[+] pbreit|11 years ago|reply
I'd be curious at what researchers were able to do with such a list (genuine, practical advances). It doesn't strike as particularly useful.
[+] dj-wonk|11 years ago|reply
Forgive me for doing so, but allow me to ask some possibly ignorant questions and perhaps play the devil's advocate for a moment. What about this release will help? What are the compelling research problems in the space?

We know users pick bad passwords. It seems to me the most compelling "problem" is hardly a research question -- isn't it about finding ways to encourage users pick strong passwords, not share them between sites, and not put them on sticky notes on their monitors.

Ok, putting my charitable hat again... My best guess is that researchers would like some idea about how long it takes to crack some percentage of accounts; e.g. with rainbow tables or other techniques?

The author mentioned "Analysis of usernames with passwords is an area that has been greatly neglected and can provide as much insight as studying passwords alone." What directions might a researcher take this?

[+] stevecalifornia|11 years ago|reply
When I first got on the Internet in 1994 I used the same password for everything for the next decade before I became security conscious (now I have a random, strong, unique password for every service).

Anyways, that password is not in this list. I have found it in other password dumps before. So, I don't know what to think.

[+] totony|11 years ago|reply
From the law quoted in the article, wouldn't it be illegal to simply make a course about computer security?

The teacher willfully (and knowingly) teaches the student about "possible means of access to a protected computer."

Note: According to http://www.law.cornell.edu/uscode/text/18/1029 teaching is defined as trafficking information ("the term “traffic” means transfer, or otherwise dispose of, to another, or obtain control of with intent to transfer or dispose of; ")

[+] avid8|11 years ago|reply
Even if this release has no implications for security, I think it may raise legitimate concerns for users' privacy. No doubt most users expect that their passwords will be known only to themselves. Many of the usernames contain real names, and many more could probably be traced to them. Ian Watkins was found to have "gloated" about his crimes in his password. With time and attention, I wonder whether such "dark secrets" could be found in this list.
[+] uptown|11 years ago|reply
How are things like Twitter accounts hacked? Are they generally brute-forced with a list like this, or how do so many of them get compromised?
[+] charlespwd|11 years ago|reply
For the lazy:

  grep -i <password> 10-million-combos.txt
[+] me_bx|11 years ago|reply
for the paranoïd lazy

    export HISTCONTROL=ignorespace
     grep -i <password> 10-million-combos.txt
(type a space before the command for it not to be logged in the history)
[+] flavor8|11 years ago|reply
And then history -c
[+] tarblog|11 years ago|reply
For the lazier, -i means case insensitive.
[+] failed_ideas|11 years ago|reply
This is great, but if you use a password manager, it's very difficult to determine which, if any, of your accounts would be compromised. For myself, this would just be doing a dump and looping a few greps. But for family and friends, does anyone have any ideas for a less technical audience?
[+] jpatokal|11 years ago|reply
If you're using a password manager and thus -- I hope -- using a different password for every service, it doesn't really matter if one service gets compromised. The compromised service in question will (hopefully) force password resets for all affected users, and the compromised password is useless elsewhere.
[+] saraid216|11 years ago|reply
Instead of responding to breaches, I would recommend an annual (more frequent is better, obviously, but I think annual is fine) cycle of rotating passwords. Just pick a day and spend it replacing passwords. As a side effect, you get a mental update on exactly what identities you're managing and whether or not you want to modify or close them.

This should be fairly straightforward even for non-technical people, if they've got a grasp on actually using the password manager itself. The hard part is (1) getting the list of identities, which isn't too hard if you're hand-holding, and (2) actually remembering to do it. (Which is why annual is nice. You can peg it to a holiday you already celebrate, or substitute it for one you don't. Halloween, for instance, because breaches are scary? Or something.)

Bonus: if a breach happens that actually feels scary, just do the rotation ritual ahead of time. Not that big of a deal.

[+] querulous|11 years ago|reply
1password has a limited ability to warn you of compromised passwords. they maintain a database of breaches that they warn you about in their client. the warning, however, is much less prominent than it probably should be
[+] yeukhon|11 years ago|reply
http://security.stackexchange.com/questions/46625/is-it-lega...

I thought of exactly the same. I was motivated by the password strength meter out there. How can you actually tell a password is strong or not or whether a password is known to attacker or not if you can ask (I was thinking along the line of private information retrieval) privately and get a probability rather than a yes/no based on all the known stolen credential out in the Internet (there are many Gbs files you can download)...

[+] jammycakes|11 years ago|reply
Just a thought here. As far as I can tell, many bona fide security researchers seem to be independent consultants. Would they be less at risk of prosecution if they were handling sensitive data such as user names and passwords under the coverage of universities and/or similar accredited institutions operating under protocols as to who can and cannot access the data?

It would probably be more security theatre than actual security, but I'd imagine that it would at least keep the FBI happy.

[+] hueving|11 years ago|reply
I wish there was an origin with these. A username/password combo I use on a ton of sites I don't care about is on here. It would be nice to know which is one leaked it.
[+] srcole|11 years ago|reply
What sorts of analyses are you guys planning? Maybe: -clustering of passwords. are aspects of the username biased towards certain clusters? -distribution of alphanumeric characters at each position of a password (e.g. 1 is a disproportionately common final character) -differences in password strength between usernames with male and female names
[+] camhenlin|11 years ago|reply
Man, I hope my password isn't in there.
[+] cwarrior|11 years ago|reply
What's your password? I could check the file to see if it's there. I found one of mine. Does anybody know from where these passwords are from?
[+] m8urn|11 years ago|reply
Actually three of my own passwords are on there, I left them in
[+] untog|11 years ago|reply
Read the actual article. None of this data is new:

All data currently is or was at one time generally available to anyone and discoverable via search engines in a plaintext

[+] aceperry|11 years ago|reply
My thoughts exactly. I'm amazed that I can download the file, but at least I get to see if any of my passwords are there.
[+] tomkinstinch|11 years ago|reply
To save a moment of time, here's a quick check that won't save the password string to your command history:

read -e -s -p "Password: " password && grep -i $password 10-million-combos.txt | wc -l && password=""