Secret scanning is now available for free on public repositories

[+] flatiron|3 years ago|reply

Good on them. GitHub secrets cause a lot of problems. They will always create a better idiot but this idiot trap is long past due.

I also can’t wait until people base64 their creds to get past this. Explaining to some that base64 isn’t encryption tends to be hard so I imagine people will feel safe just base64 and checking it in.

[+] banana_giraffe|3 years ago|reply

base64 is far too much work. A new dev turned '"AKIAIOSFODNN7EXAMPLE"' into '"AK" + "IAIOSFODNN7EXAMPLE"' to make the security alert go away.

Thankfully, the alert was sent to enough people it was caught by someone else, and the key was destroyed before someone outside could have fun with it.

[+] depereo|3 years ago|reply

Base64: if this format wasn't secure why are kubernetes 'secrets' stored in it huh?

;)

[+] woutr_be|3 years ago|reply

I legitimately recently had to argue with a PM and his developers, that a base64 encoded user ID isn’t considered security best practices for API authentication. Even when I showed them how I can produce the “secret” myself, they kept arguing that I was wrong.

[+] CSSer|3 years ago|reply

Or md5. I wouldn’t be surprised to see that from some in PHP land.

[+] dinvlad|3 years ago|reply

Does secret scanning also apply to public GitHub Action logs and Issues (or more generally, Checks logs)?

We found Action logs to be a much bigger threat now that many folks have learned not to embed secrets directly into the code and to use secret managers instead. But even then, the secrets retrieved in a step can be printed in plaintext if someone, for example, runs that step debug mode.

Issues can also accidentally leak secrets via for example, third-party code builders that print their output in an issue.

[+] greysteil|3 years ago|reply

GitHub PM here. Right now we scan code, commit metadata, issues, and issue comments. We're expanding to other content types over time, with support for pull request bodies and comments coming in early 2023. Actions logs are on our list too, but will take a little longer.

(It's worth noting that any secrets in your Actions secret store will already be redacted in any Actions logs, so those won't leak there.)

[+] runlevel1|3 years ago|reply

Searching for creds can be tricky if they can't be readily distinguished from other text.

Can anyone think of a problem with generating customer API keys that have a known prefix that makes them more detectable?

For example, a key like "FooSecret.ZTNiMGM0NDI5OGZjMWMxNDlhZmJmNGM4OTk2ZmI5". I wouldn't think that'd open up any new attacks, but I'm no expert on the matter.

[+] greysteil|3 years ago|reply

GitHub PM here. We switched our own token format to something similar to the above in April of last year and have been encouraging other service providers to do the same.

The big benefit of highly identifiable tokens is not just that we can alert on them, but that we can scan for them at pre-receive time and prevent them from leaking (by rejecting the push). We already have that functionality as part of GitHub Advanced Security, and are planning to make it available (for free) on public repos in 2023.

[1] https://github.blog/2021-04-05-behind-githubs-new-authentica...

[+] remram|3 years ago|reply

I argued for something like that previously on HN, like adding a domain prefix 'myservice.com_secretkeyhere'. This would allow automatic discovery of the reporting/revocation endpoint from the key. Then someone pointed out that you could just use an actual URL as your secret key and have that be the URL you visit to revoke it, and I think that is genius.

Next service I make that has API keys, I will make them look like `https://secret.myservice.org/ZTNiMGM0NDI5OGZjMWM`. POSTing to that URL revokes the key, a GET shows a form explaining what it is and a button to revoke the key.

One issue is that some email services mangle URLs specifically, and that would be bad for keys.

(edit: sudhirj is the genius: https://news.ycombinator.com/item?id=28299624)

[+] nikeee|3 years ago|reply

GitHub does this with the tokens they issue. They even have a checksum in the token, so they can check if the token is syntactically valid:

https://github.blog/2021-04-05-behind-githubs-new-authentica...

They have a list of supported secrets they can find via automated scans:

https://docs.github.com/en/code-security/secret-scanning/sec...

[+] tnorthcutt|3 years ago|reply

This is exactly what Stripe and I believe some other companies do, partially for this exact reason.

[+] letmeinhere|3 years ago|reply

Another approach is to identify likely ones based on the entropy of the strings. I used a tool that did precisely this once and found some, but can't find it anymore.

[+] dghlsakjg|3 years ago|reply

Not much of a downside, but it means that they are really easy to detect for attackers as well.

Really easy to just grep through something looking for that prefix

[+] evdubs|3 years ago|reply

Do you think they call this service their "Secret Scanta"?

[+] heelix|3 years ago|reply

We use this at our company. Wildly successful at finding tokens for most of the usual suspects. If they are including secret blocking - it will prevent someone from doing the dumb as well.

One question/behavior - if the secret scanner found something and folks resolved it -> secret blocking is enabled -> and a developer does the dumb again, should it block the PR with the new secret? Wondering if we might have something misconfigured as I have seen new secrets get added after we enabled blocking.

[+] aashah-gh|3 years ago|reply

Hello! I am an engineer on the Secret Scanning team, thanks for the kind words!

- "push protection" (as we call it) isn't available for free, and isn't part of this rollout.

- For folks who do pay, the flow may be: a developer tries to push, they bypass the secret, are now able to push. From there, an alert is created which they can resolve (maybe it is "used in tests").

- If the _same_ secret is pushed again, we won't block that push. We also won't create a new alert; however, a new location may be recorded within the resolved alert (if you click into it).

If you're seeing push _not_ get blocked, what's most likely is that we just don't support that specific token as part of push protection (we have some much-needed improvements to do to the docs to make this more clear). Since push protection sits in front of the developer, we try not to annoy them with high-false positive tokens. There are a few other possibilities though, so hard to say.

[+] unknown|3 years ago|reply

[deleted]

[+] daguava|3 years ago|reply

Don't let the perfect be the enemy of the good - this will start out in a limited detection of course, but can easily be improved with other hashes and scanning over time.

[+] andrewflnr|3 years ago|reply

What's the workflow where people accidentally commit secrets to their git repos? I'm not sure I've ever done it; do we count the "base_secret" type of things web frameworks put in their default app templates? Certainly the more common mistake I make is forgetting to add new files, so it's mildly amusing that other people apparently have the opposite problem.

[+] rieTohgh6|3 years ago|reply

People keep adding whole tmp/ directories or output binaries to repositories, accidents like this stuff just happening. It is not a workflow, but for a scenario: people trying to run some test, on real service, to debug some weird issue, will temporary put credentials and forget to remove them before comiting the fix. Sure, someone probably will notice it in code review but it is too late if repo was public.

[+] dinvlad|3 years ago|reply

Lots of ways this happens either accidentally or intentionally. I think most common accident is due to forgetting to add a file to .gitignore and then using git add . . Intentionally, folks just embed secrets into code out of convenience while developing, and either never even think twice, or forget to remove them before commit & push (which becomes kinda an accident)

[+] decodebytes|3 years ago|reply

mostly accidental. you're working on a prototype, so to just get started you use a const at the top of your code with an API key, this then gets checked in and you then realise 'oh shit' , but by this point its within gits tree. It can still be removed, but its not a straightforward process.

[+] idoh|3 years ago|reply

If you are in the CircleCI CI/CD space then adding it to a config to power some workflow step.

[+] eranation|3 years ago|reply

Mitigation is premium. Detection should be free.

https://www.arnica.io/blog/secret-detection-needs-to-be-free...

[+] joshxyz|3 years ago|reply

there goes my fontawesome pro license keys stolen from other people's public repos lol

[+] holdfastjody|3 years ago|reply

Is scanning only based on regex? Or can it, say, parse a JWT and infer who it came from through those properties?

[+] greysteil|3 years ago|reply

GitHub PM here. It scans using regexes and then applies post-processing. So yep, it can (and does) parse JWTs to understand their properties.

[+] pabs3|3 years ago|reply

Does this also check for private keys - SSH, OpenPGP, X.509 etc?

Also what about 2FA secrets like TOTP/WebAuthn?

[+] jrm4|3 years ago|reply

Are a lot of "private-ish" repos (perhaps something that supports a real company) using Github and not self hosting? I presume this is the case, but it seems dumb.

Why not just self-host?

70 comments