top | item 8818035

Why Deleting Sensitive Information from GitHub Doesn't Save You

313 points| jwcrux | 11 years ago |jordan-wright.github.io | reply

82 comments

order
[+] guiambros|11 years ago|reply
> In this post, I’m going to show exactly how hackers instantly harvest information committed to public Github repositories...

A few days ago I published my blog to GitHub, with my MailGun API key in the config file (stupid mistake, I know). In less than 12 hours, spammers had harvested the key AND sent a few thousand emails with my account, using my entire monthly limit.

Thankfully I was using the free MailGun account, which is limited to only 10,000 emails/month, so there was no material damage. Their tech support was awesome in immediately blocking the account and notifying me, and then quickly helping to unblock the account after keys and passwords were changed, and repo made private.

I was exactly wondering how they were able to harvest GitHub content so quickly; it couldn't be web scrapping or a random search. This article explains well how to drink from GitHub's events firehose and the GHTorrent project, so everything makes sense now. Thanks for posting it.

EDIT: This other post[1] describes a similar situation. There are some folks monitoring ALL GitHub commits and getting psswords as they are commited, on the fly.

[1] http://www.devfactor.net/2014/12/30/2375-amazon-mistake/

[+] infinitone|11 years ago|reply
I had a similar but less pleasant experience. I had decided to opensource an old side project of mine, that gets a good amount of users daily. And by that, it was just initially to make the repo public. But I had totally forgot about the mail server keys- this was a paid mail server, so you can imagine my disbelief when I get an email of a $1000 bill and a complaint saying that I had sent upwards of 250k emails with what seemed to be a iOS mail app malware email. Luckily it was resolved within a week with support.
[+] olefoo|11 years ago|reply
There's a fairly straight forward pattern for keeping sensitive credentials out of github. It comes straight from http://12factor.net/config store configuration data in the environment.

What I do for most projects is keep the tree containing the working directory in a directory that has some other items that don't belong on github (like the project brief, my emacs bookmarks file, random notes related to the project etc. ) and in that directory there is a .credentials file containing a set of export statements somewhat like:

   export AWS_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXX
    export AWS_SECRET_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    export AWS_USER_ID=############
If I'm feeling extra paranoid, I'll encrypt that into a blob that I only decrypt when I'm working on said project.

Then at startup the app goes looking for it's config in the environment. This does create issues for some environments ( solving this for docker is trivial ) but you can usually pass environment variables to whatever is executing your code reasonably securely. Now it's not perfect, and environments can sometimes be revealed externally if an attacker is determined and clever and focused on your app for some reason.

But it does give you a hygienic procedure that keeps your credentials that are equivalent to an open draw on your bank account out of public repositories.

[+] califield|11 years ago|reply
I use the `dotenv`[1] package with Node.js and it does exactly the same thing: environment variable definitions that you can store elsewhere in a dead-simple format.

To be fair, I think they just copied the `foreman` tool from Heroku. However, it works great. Most projects don't need anything more than a flat hierarchy of secret keys and values.

Writing your own parser for a `.env` file is a piece of cake, even in shell language.

Adding `etcd` is better, but it's too much work for a small project.

[1] https://github.com/motdotla/dotenv

[+] tomphoolery|11 years ago|reply
It should be noted that GitHub's article on removing sensitive data is still applicable if you haven't pushed anything back to GitHub yet. Remember that a commit is just an entry into your repo, it doesn't synchronize with `origin/master` until you tell it to. So if the user has not pushed to GitHub yet, but has committed in their local Git repo, they should follow GitHub's guide and not worry about changing any keys.
[+] ncallaway|11 years ago|reply
While it's absolutely true that if the credentials haven't been pushed then you are not compromised, I would still encourage people to rotate their credentials regardless.

All it takes is a mistake when deleting the sensitive information, or having pushed without realizing it to be compromised. Even if you're absolutely positive there wasn't a breach, it can be a good excuse to drill for a _real_ breach later.

It never hurts to walk through the practice of what to do if credentials leak when there's no pressure.

[+] PhantomGremlin|11 years ago|reply
If you ever put anything out on the Internet, not just to GitHub, consider it to be public information. Forever. You might be able to convince archive.org to remove it, but there are hundreds of players out there who aren't as ethical.

Ben Franklin figured this out many years ago:

   Three can keep a secret,
   if two of them are dead.
[+] revelation|11 years ago|reply
So many words for one simple principle: if sensitive data has been publicly accessible or transferred in plaintext over the internet, consider it compromised, logged stored and abused.

The only recourse is to immediately change or revoke access.

[+] nutanc|11 years ago|reply
I think this problem is widespread enough and there are enough idiots out there(me included),that there should be a feature request for Github to provide a prompt in case Github detects sensitive information in the code hosted.
[+] zorked|11 years ago|reply
Amazon crawls Github looking for keys and disabled it. Happened to my company once!
[+] mkal_tsr|11 years ago|reply
> there should be a feature request for Github to provide a prompt in case Github detects sensitive information in the code hosted.

Sure, just enumerate any and all possible types of sensitive data, the format they may be in, regex / matching functions to account for them (supported across 20+ programming languages) and I'm sure Github will have that done asap.

Alternatively, don't commit passwords/API-keys/sensitive-info to your repo.

[+] gojomo|11 years ago|reply
Similarly, a white hat could watch /events and warn users and/or services when credentials are 'burned'. (A major exploitable service like AWS might even want to do this itself.)
[+] RexRollman|11 years ago|reply
I don't know about that. It seems to me that you are wanting Github to do what the committer should be doing.
[+] akerl_|11 years ago|reply
To be clear, the guide from GitHub that's linked at the top of this article clearly states that you should consider the sensitive data compromised. Cleaning it out of the repo is a good move, but it's a companion move to rotating out those creds or whatever for new ones.
[+] fragmede|11 years ago|reply
That guide highlights this in its own box. In red.

Not sure how GitHub could make it more obvious.

Perhaps if they mentioned there are unscrupulous users out there who have a script that hammers GitHub's events API to search for exposed passwords/keys, then it would reduce the 'oops I only pushed it for a second' thinking that users likely go through.

[+] femto113|11 years ago|reply
My advice: USE PRIVATE REPOS! At $7/month Github's micro plan with 5 repos is just $1.40/repo-month. This is the cheapest insurance you can get against the nearly inevitable mistake of committing something sensitive.
[+] stinos|11 years ago|reply
Could also have a look at BitBucket instead: unlimited private personal (or teams of max 5) repos at $0/month. Or for $7/month you can host your own at DigitalOcean/Azure/...
[+] mmahemoff|11 years ago|reply
Sure, use private repos for private projects but this is about open-source authors accidentally leaving their credentials in config files and the like.
[+] xasos|11 years ago|reply
Always use environment variables. They are probably the best way to safeguard your API keys.
[+] TTPrograms|11 years ago|reply
I've always wondered the proper way to deal with this, and this makes total sense. How would you typically set such an environment variable? In bash init?
[+] xasos|11 years ago|reply
It always amazes me to see the sheer amount of API keys left around in GitHub repositories. You can search anything like Twilio API Key and come out with hundreds of thousands of results. I wonder to what extent these keys have been exploited.
[+] baxter001|11 years ago|reply
A script to post random key containing config-like files to public repos and waste these guy's bandwidth/light them up on amazon's blacklist radar would be a cool idea.
[+] DenisM|11 years ago|reply
On MacOS theres keychain - it's a designated place for storing secrets.

On windows I create a batch file at a fixed location with all the credentials in it. A script simply runs this batch file and reads the env cars to get values. A compiled program parses the batch file with regex to find required values. This works remarkably well for keeping credentials out of the code base.

Hope that hels someone.

[+] icymatter|11 years ago|reply
Github has very good cache. In the past, when I deleted a repository I still was able to access some diff and commit information from my own activity pages. I had to request Github team to clear that page manually.
[+] jquast|11 years ago|reply
I'm very certain this is a hacker's account configured to follow a great deal of projects and people (2k projects, 1.3k users) for this very purpose -- a suspicious [redacted] [unknown] profile, https://github.com/trnsz
[+] jpetersonmn|11 years ago|reply
First time I tried to use github I uploaded my gmail password which I was using to send myself an email when something failed. I figured that there would be bots that would scoop up that information right away. Luckily I realized what I had done before people could get into my gmail.
[+] jpdlla|11 years ago|reply
Thinking of actually working on a tool for this. Will have a blacklist of "searches" that might contain sensitive data and perhaps notifying via the email of the committer or creating an issue on the repo. Anyone else want to get involved?
[+] unknown|11 years ago|reply

[deleted]

[+] thefreeman|11 years ago|reply
They really don't. You can read them from environment variables (which would be set by your platform eg. heroku) or you could set up a vagrant instance with the necessary services to develop locally.
[+] dllthomas|11 years ago|reply
"API keys [...] need to be in your committed code"

Why?

[+] Killswitch|11 years ago|reply
> "API keys [...] need to be in your committed code"

No they don't.