Linus on Git and SHA-1

[+] joatmon-snoo|9 years ago|reply

The actual mailing list discussion thread can be found here, and is infinitely more informative than any of the bull being spouted in this thread: http://public-inbox.org/git/20170226004657.zowlojdzqrrcalsm@...

[+] azernik|9 years ago|reply

My takeaway from all that - the git project is internally taking steps to mitigate vulnerabilities from this particular attack (making it harder to insert the arbitrary binary data necessary into git metadata), but a) is just throwing up their hands at the problem of projects that store binary blobs like image data in their repos, and b) is not taking this as a signal that more serious sha-1 attacks are on the horizon and they should speed up their hash-replacement efforts.

This latter leads into the problems with Linus' positions in particular. In that thread, does not take seriously the threats that this poses to the broader git userbase, because he only seems to care about the kernel use-case: trusted hosting infrastructure at kernel.org (itself an iffy assumption, given previous hacks and the use of mirrors), and the exclusive storage of human-readable text in the repo which makes binary garbage harder to sneak in. These do not apply to most users of git. His rather extreme public position (paraphrased, "our security doesn't depend on SHA1") is even more troubling - it absolutely does depend on SHA1, this just isn't (yet) a strong enough attack to absolutely screw over the kernel. A stronger collision attack (eg a chosen-prefix as opposed to identical-prefix, or godforbid a pre-image attack) would absolutely invalidate the whole git security model.

[+] bascule|9 years ago|reply

Linus's transition plan seems to involve truncating SHA-256 to 160-bits. This is bad for several reasons:

- Truncating to 160-bits still has a birthday bound at 80-bits. That would still require a lot more brute force than the 2^63 computations involved to find this collision, but it is much weaker than is generally considered secure

- Post-quantum, this means there will only be 80-bits of preimage resistance

(Also: if he's going to truncate a hash, he use SHA-512, which will be faster on 64-bit platforms)

Do either of these weak security levels impact Git?

Preimage resistance does matter if we're worried about attackers reversing commit hashes back into their contents. Linus doesn't seem to care about this one, but I think he should.

Collision resistance absolutely matters for the commit signing case, and once again Linus is downplaying this. He starts off talking about how they're not doing that, then halfway through adding a "oh wait but some people do that", then trying to downplay it again by talking about how an attacker would need to influence the original commit.

Of course, this happens all the time: it's called a pull request. Linus insists that prior proper source code review will prevent an attacker who sends you a malicious pull request from being able to pull off a chosen prefix collision. I have doubts about that, especially in any repos containing binary blobs (and especially if those binary blobs are executables)

Linus just doesn't take this stuff seriously. I really wish he would, though.

[+] infinity0|9 years ago|reply

> "A hash that is used for security is basically a statement of trust [..] In contrast, in a project like git, the hash isn't used for "trust". I don't pull on peoples trees because they have a hash of a4d442663580. Our trust is in people, and then we end up having lots of technology measures in place to secure the actual data."

This is horseshit, and Linus should not be saying these hugely misleading statements about security principles.

The point of a hash is to remove the need for trust between the trusted person who tells you the hash and the infrastructure you get the actual data that was hashed, from (edit: and, between you and the latter).

In other words, once you get a good non-colliding hash from a trusted person, then you don't need to worry about malicious infrastructure sending you bad data claiming to be the source of that hash.

Linus trusting Tytso to sign the commit object that references the SHA-1 of the tree object, says nothing about whether the infrastructure served him the tree object correctly. Sure, he might also trust the infrastructure providers, but when he says "trusts people" it does not sound like that is what he means. And even if he trusts the infrastructure providers, with a good hash HE DOESN'T HAVE TO.

The "trust" wording is serious horseshit.

(edit: there is also the case of people downloading "linux" from random git repos in the future. Right now if you GPG-sign a commit or tag, it has SHA-1 references to the tree object underneath it. Once SHA-1 is more broken it basically means you shouldn't trust random git repos across the internet to give you good content, even if it's "signed by Linus".)

[+] tytso|9 years ago|reply

That's not the plan. That was an idea that was thrown out if this was an emergency (it's handling different length hashes, and doing so that we don't have to force a flag day conversion which is hard), but once people realized that in fact, the sky was not following, the plan which Linus outlined in his G+ post was devised --- which does not involve truncating a 256-bit hash.

[+] the_mitsuhiko|9 years ago|reply

> Linus's transition plan seems to involve truncating SHA-256 to 160-bits.

This is not the plan. This is the backwards compatible system for tools that can only deal with 40 characters.

[+] ksherlock|9 years ago|reply

On the mailing list he suggested displaying 160 bits for backwards compatibility with existing tools. The full 256 bits would be used internally (and displayed externally with the proper --flag)

[+] simplehuman|9 years ago|reply

> Linus just doesn't take this stuff seriously. I really wish he would, though.

Can't downvote this enough. This is plain FUD. Did you even read the complete thread on the git mailing list? This was just one proposal by him.

[+] mappu|9 years ago|reply

>(Also: if he's going to truncate a hash, he use SHA-512, which will be faster on 64-bit platforms)

BLAKE2 is faster still! It's also at least as secure as SHA-3, and produces any choice of output size up to 512bit.

[+] chainsaw10|9 years ago|reply

> reversing commit hashes back into their contents

Somewhat off topic, but is this actually possible?

Given hashing is inherently lossy, I'm inclined to assume it's not possible for anything must longer than a password, but commits are text, which I suppose is low entropy per character, so I don't know.

[+] hueving|9 years ago|reply

>Preimage resistance does matter

Is there a preimage weakness though? I thought this attack only reduced collision resistance.

[+] hermitdev|9 years ago|reply

Although, I generally defer to someone such as Linus in having far more domain knowledge such as this, but I'm concerned willingness to just drop bits of the hash like this. I mean 20-30 years ago, I could quasi understand for the sake of performance, but are we really so concerned about performance of a few clock cycles vs opening ourselves to a known vulnerable attack?

[+] Dylan16807|9 years ago|reply

- Truncating to 160-bits still has a birthday bound at 80-bits. That would still require a lot more brute force than the 2^63 computations involved to find this collision, but it is much weaker than is generally considered secure

What level is considered secure, then? The numbers for O(time) and O(space) should be many orders of magnitude apart to represent the relative costs.

[+] unknown|9 years ago|reply

[deleted]

[+] unknown|9 years ago|reply

[deleted]

[+] amluto|9 years ago|reply

> - Post-quantum, this means there will only be 80-bits of preimage resistance

Post-quantum, it's only ~2^53 work to find a collision. IMO that's worse.

[+] unknown|9 years ago|reply

[deleted]

[+] paulddraper|9 years ago|reply

> especially in any repos containing binary blobs

Yeah, that part is the real flaw in the argument.

[+] gkya|9 years ago|reply

If a repo contains binary blobs, especially executables, well, that's very bad practices right there. Also, how can somebody else modify a binary in a meaningful way and send a patch to it? How can you review a patch to a binary file before applying? I'd say that any sane project, especially if open source, would not include binaries (maybe apart from images), and even if it did, would not accept patches to them (if the members of a project were talking about changing, say, an icon, valid images would be exchanged on a mailing list/issue tracker, nobody would bother making, sending and applying binary diffs; then somebody w/ commit bit just commit it).

Let me put it more simply: if you're accepting patches for binary files in your repos you don't care about security at all. Maybe unless if you know how to decode machine code/JPEG manually.

Also, the proposition was to use the full hash internally, and truncate the representation. Nobody other than git itself uses full hashes anyways.

[+] unknown|9 years ago|reply

[deleted]

[+] runeks|9 years ago|reply

One thing SHA-256 has going for it is that millions can be made from finding pre-image weaknesses in it, because it's used in Bitcoin mining. If you could "figure out" SHA-256, and use it to take over Bitcoin mining, you'd make $2M the first 24 hours, at current rates. And if you play it wise, it could take a long time before anyone figure out what's going on.

With regards to market price for a successful attack, I don't think any hash function stands close to SHA-256. And for that reason I think it would be the right choice.

[+] wolf550e|9 years ago|reply

https://twitter.com/veorq/status/834872988445065218

[+] kzrdude|9 years ago|reply

SHA-512/256 is faster and has the same output length

[+] maxander|9 years ago|reply

I don't really get the threat model here. If an attacker is pushing commits into your repository, you're long since toast on all possible security fronts, right? Is there anything nefarious they could accomplish through hash collisions that couldn't be done simply by editing commit history?

[+] leoh|9 years ago|reply

Not really. From Linus — I think the most important point that has not been discussed extensively:

> But if you use git for source control like in the kernel, the stuff you really care about is source code, which is very much a transparent medium. If somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice.

[+] azernik|9 years ago|reply

If they edit the commit history and you're using a secure hash algorithm, then the hash of the current commit will change and no longer match the signed tag your trusted maintainer sent you.

[+] hannob|9 years ago|reply

One thing that I think is worth mentioning: This was completely avoidable. Git isn't that old, it wasn't taken by surprise by the SHA1 attacks.

The first paper from Wang et al, which should've put SHA1 to rest, was published in 2004, the year before the first ever Git version was released. It could have been easy: Just take a secure hash from the beginning.

[+] ploxiln|9 years ago|reply

If anyone is really interested in more assurance of git commit contents, there's "git-evtag", which does a sha-512 hash over the full contents of a commit, including all trees and blob contents.

https://github.com/cgwalters/git-evtag

[+] simias|9 years ago|reply

While this post sounds very reasonable to me there's one point that I really don't get: why does he keep saying that git commit hashes have nothing to do with security?

If he believes that, why does git allow signing tags and commits and why does Linus himself sign kernel release tags? Isn't that the very definition of "using a hash for security"?

[+] hackuser|9 years ago|reply

Related, from Mozilla:

* The end of SHA-1 on the Public Web

https://blog.mozilla.org/security/2017/02/23/the-end-of-sha-...

As announced last fall, we’ve been disabling SHA-1 for increasing numbers of Firefox users since the release of Firefox 51 using a gradual phase-in technique. Tomorrow [Feb 24th], this deprecation policy will reach all Firefox users. It is enabled by default in Firefox 52.

[+] azernik|9 years ago|reply

To be fair (although I've been commenting angrily about git's continued use of SHA-1 elsewhere) it's a lot easier for a browser to change hash algorithms than for git.

[+] luckydude|9 years ago|reply

Linus is a little behind the times with this comment:

``Other SCM's have used things like CRC's for error detection, although honestly the most common error handling method in most SCM's tends to be "tough luck, maybe your data is there, maybe it isn't, I don't care".''

BitKeeper has an error detection (CRC per block) and error correction (XOR block at the end) system. Any single block loss is correctable. Block sizes vary with file size so large files have to lose a large amount of data to be non-correctable.

[+] butwhynotmore|9 years ago|reply

In the specific case of cryptography where it's unknown how bulletproof the algorithm will be why not use multiple hash functions? Perhaps using the top 10 best hash functions of the day. That way you're not putting all your eggs in one basket and if nefarious collisions are able to be created in the future you still have the other hash functions to both "trust" and double check against. It's even more unlikely that nefarious collisions will be able to be constructed that collide all the other hash functions as well. You could just append the hashes to each other or put them in a hash table or something. Maybe my computer science is not up to snuff but it seems like this would provide more resiliency against future and non-public mathematical breakthroughs as well as increased computing power such as quantum computing. Yes, it would take a little longer to compute all the hashes in day to day use, but with the benefit of a more robust system both now and in the future.

[+] theseoafs|9 years ago|reply

Have there been writings on what exactly git's migration strategy to a new hash function will be? Apparently they have a seamless transition designed that won't require anyone to update their repositories, which seems like a pretty crazy promise in the absence of details.

[+] keeperofdakeys|9 years ago|reply

In git the SHA-1 hash is simply an identifier for an object - it's used in the filename, but not stored in the object. And when a commit or tree object references others, it's just a name that can be looked up in the database. So a commit object hashed with SHA-256 can easily reference a previous commit that was hashed with SHA-1.

During the switch, a bit of deduplication may be lost. But the only interesting issue I can see is how git fsck will tell which hash an object was created with when verifying the hash (maybe with length?).

[+] claar|9 years ago|reply

Also see discussion of Linus's earlier comments at https://news.ycombinator.com/item?id=13719368

[+] godzilla82|9 years ago|reply

Newbie question .. can some one please help me understand the attack scenario. if I, as the attacker, want to inject malicious code/binary into a git repo, then I need to write my malicious code/binary in such a way that the resultant hash collides with one of the commits (? Or the last one?) in the repo. Is this correct?

[+] jmount|9 years ago|reply

Probably isn'y the sky falling. But if knowing the length fixed all hash function issues then cryptographic hashes would just use a some more bits for length.

[+] frik|9 years ago|reply

Can someone correct me. SVN/Subversion and GIT are affected by SHA-1 problem. SVN uses SHA-1 internally, but exposes only a numeric int as revision. GIT uses SHA-1 internally and as revision. So if someone commit a modified PDF that collides he can run havoc on both SVN and GIT at the moment. It seems easier to fix the issue in SVN than GIT.

[+] hannob|9 years ago|reply

It's a somewhat different issue.

Git can probably not be havoced by committing two colliding files (and doing so would require doing another chosen prefix attach with a git blob header). But git looses cryptographic integrity promises due to this attack (aka: you can have different source trees with different histories leading to the same top commit hash). svn never had any cryptographic integrity to begin with.

[+] kzrdude|9 years ago|reply

Sure, here's a correction: There has been no evidence of havoc or repo corruption in git.

It appears that to even try to attack git you'd need to spend the same amount of work again ($110K or what it was) to create a new collision with the right git object headers.

[+] markiniotome|9 years ago|reply

[deleted]

[+] yuhong|9 years ago|reply

I do wonder how many outside of crypto circles know about SHA-2 circa 2004.

[+] unknown|9 years ago|reply

[deleted]

[+] colin_fraizer|9 years ago|reply

This, btw, is why we have e-cigarette bans. The fact that the generally high-IQ, paid-to-think-about-subtle-categorization community of software developers needs to be inoculated against the "I Heard SHA-1 Was Bad Now" meme, should serve as a reminder for why most things should not be managed by democracy.

(Yeah, I know this will be read as a plea for monarchy and downvoted. It simply proves my point: people are WAY too subject to errors in the classes (1) "I hate him because he said something 'bad' about something 'good'." and (2) "I hate him because he said something 'good' about something The Tribe now knows is 'bad.')

[+] grzm|9 years ago|reply

Save yourself some downvotes and remove the mention that you expect them.

[+] dboreham|9 years ago|reply

Um what? Software written in the past 20 years has a baked-in assumption that the length of some ID can't change?

170 comments