At Open Whisper Systems, we wrote a small open source gradle plugin called "gradle-witness" for this reason. Not just because dependencies could be transported over an insecure channel, but also because dependencies could be compromised if the gradle/maven repository were compromised:
Hi moxie, might I ask if you've considered SHA-384 instead?
If I understand correctly, SHA-256 is part of the SHA-2 family of hash algorithms, and like SHA-1, when used alone it is subject to length extension attacks.
SHA-384 is also a member of the SHA-2 algorithm family, but is immune to length extension attacks because it runs with an internal state size of 512 bits -- by emitting fewer bits than its total internal state, length extensions are ruled out. (Wikipedia has a great table for clarifying all these confusing names and families of hashes: [5].) Other hashes like BLAKE-2 [1], though young, also promise built-in immunity to length-extension attacks. mdm [2] is immune to this because the relevant git datastructures all include either explicit field lengths as a prefix, or are sorted lists will null terminators, both of which diffuse length extension attacks by virtue of breaking their data format if extended.
Not that it's by any means easy to find a SHA-256 collision at present; but should collisions be found in the future, a length extension attack will increase the leverage for using those collisions to produce binaries that slip past this verification. An md5 Collision Demo[3] by Peter Selinger is my favourite site for concretely demonstrating what this looks like (though I think this[4] publication by CITS mentions the relationship to length extension more explicitly).
(I probably don't need to lecture to you of all people about length extensions :) but it's a subject I just recently refreshed myself on, and I wanted to try to leave a decent explanation here for unfamiliar readers.)
--
I'm also curious how you handled management of checksums for transitive dependencies. I recall we talked about this subject in private back in April, and one of the concerns you had with mdm was the challenge of integrating it with existing concepts of "artifacts" from the maven/gradle/etc world -- though there is an automatic importer from maven now, mdm still requires explicitly specifying every dependency.
Have you found ways to insulate gradle downloading updates to plugins or components of itself?
What happens when a dependency adds new transitivity dependencies? I guess that's not a threat during normal rebuilds, since specifying hashes ahead of time already essentially forbids loosely semver-ish resolution of dependencies at every rebuild, but if it does happen during an upgrade, does gradle-witness hook into gradle deeply enough that it can generate warnings for new dependencies that aren't watched?
This plugin looks like a great hybrid approach that keeps what you like from gradle and while starting to layer on "pinning" integrity checks. I'll recommend it to colleagues building their software with gradle.
P.S. is the license on gradle-witness such that I can fork or use the code as inspiration for writing an mdm+gradle binding plugin? I'm not sure if it makes more sense to produce a gradle plugin, or to just add transitive resolution tools to mdm so it can do first-time setup like gradle-witness does on its own, but I'm looking at it!
--
Edited: to also link the wikipedia chart of hash families.
I am totally happy donating $10 to whisper systems for this work instead of forcing me to donate $10 to Apache Foundation (although a worthy cause) to be able to get https access to Maven Central.
For Leiningen at least the goal is eventually to be able to flip a switch that will make it refuse to operate in the presence of unsigned dependencies. We're still a ways away from that becoming a reality, but the default is already to refuse to deploy new libraries without an accompanying signature.
Edit: of course, the question of how to determine which keys to trust is still pretty difficult, especially in the larger Java world. The community of Clojure authors is still small enough that a web of trust could still be established face-to-face at conferences that could cover a majority of authors.
The situation around Central is quite regrettable though.
I'm pretty surprised that this article is news. Sonatype has been open about SSL for Maven Central since there has been Nexus or maybe even longer. I remember Jason van Zyl talking about this seven or more years ago.
SSL would have partially mitigated this attack, but it's not a full solution either. SSL is transport layer security -- you still fully trust the remote server not to give you cat memes. What if this wasn't necessary? Why can't we embed the hash of the dependencies we need in our projects directly? That would give us end-to-end confidence that we've got the right stuff.
This is exactly why I built mdm[1]: it's a dependency manager that's immune to cat memes getting in ur http.
Anyone using a system like git submodules to track source dependencies is immune to this entire category of attack. mdm does the same thing, plus works for binary payloads.
Build injection attacks have been known for a while now. There's actually a great publication by Fortify[2] where they even gave it a name: XBI, for Cross Build Injection attack. Among the high-profile targets even several years ago (the report is from 2007): Sendmail, IRSSI, and OpenSSH! It's great to see more attention to these issues, and practical implementations to double-underline both the seriousness of the threat and the ease of carrying out the attack.
Related note: signatures are good too, but still actually less useful than embedding the hash of the desired content. Signing keys can be captured; revocations require infrastructure and online verification to be useful. Embedding hashes in your version control can give all the integrity guarantees needed, without any of the fuss -- you should just verify the signature at the time you first commit a link to a dependency.
Why can't we embed the hash of the dependencies we need
in our projects directly?
There's a lot of stuff in Maven, like the versions plugin and the release plugin, to update dependencies to the latest version. This stuff is useful for continuous integration and automated deployment, especially when your project is split into lots of modules to allow code reuse.
With code signing, you can (or hypothetically could, I don't know if anyone does this) check the latest version is signed by the same key as the previous version - whereas just pinning the hash wouldn't allow that.
I agree pinning the hash is useful if the signing key is captured.
Perhaps as a stopgap Maven Central (or a concerned third party?) could publish all of the SHA1 hashes on a page that is served via HTTPS. This would at least allow tools to detect the sort of attack described in the article.
Evilgrade (https://github.com/infobyte/evilgrade) is a similar tool that works on a wider variety of insecure updaters. Perhaps a module could be written? Maybe one already exists, I haven't played with it in a while
I'm torn on how I feel about security being a paid feature in this case. Here the onus is being placed on the user, yet many won't be conscious of the choice they're making.
If you aren't paying money, you aren't the user, you are a product.
Freemium models often suck because of stuff like this[1]. But if the "users" would just consider it normal to pay money then we wouldn't have crazy things going on where people providing critical infrastructure services need to figure out how to "convert" their "users." Instead, say, every professional Java shop would pay $100 a year or so for managed access. Projects that want to use it like a CDN so their users could download would pay a fee to host it.
They have bills to pay. They'll cover them one way or the other. If we pay directly at least we know what the game is.
[1] They could be inserting advertising into the jars. Hey, at least it would still be a "free" service, right?
That's a little paranoid. Let's see, we'll completely ruin our rep and our core business activity just so you're forced to donate--not to us, but to this open source group over here. Dude, put down the pipe.
My main experience with Maven has been downloading some source code, and having to use Gradle to compile it. It went and downloaded a bunch of binaries, insecurely. There were no actual unsatisfied dependencies; it was just downloading pieces of Gradle itself.
I would've much rather had a Makefile. Build scripts and package managers need to be separate.
I will join the small chorus agreeing that build scripts and package managers should be separate. Most folks I work with disagree.
Curious if anyone knows of any well done takes on this. In either way. (If I'm actually wrong, I'd like to know.) (I fully suspect there really is no "right" answer.)
jCenter is the new default repository used with Android's gradle plugin, I haven't used it myself yet but it looks like the site defaults to HTTPS for everything: https://bintray.com/bintray/jcenter
Full disclosure - I am a developer Advocate with JFrog, the company behind Bintray.
So,jcenter is a Java repository in Bintray (https://bintray.com/bintray/jcenter), which is the largest repo in the world for Java and Android OSS libraries, packages and components. All the content is served over a CDN, with a secure https connection.
JCenter is the default repository in Goovy Grape (http://groovy.codehaus.org/Grape), built-in in Gradle (the jcenter() repository) and very easy to configure in every other build tool (maybe except Maven) and will become even easer very soon.
Bintray has a different approach to package identification than the legacy Maven Central. We don't rely on self-issued key-pairs (which can be generated to represent anyone, actually and never verified in Maven Central). Instead, similar to GitHub, Bintray gives a strong personal identity to any contributed library.
If you really need to get your package to Maven Central (for supporting legacy tools) you can do it from Bintray as well, in a click of a button or even automatically.
Nice, I was going to ask if maybe Google or someone invested heavily in Android could step up and provide a secure source of dependencies for everyone.
The biggest problem with this policy is that new users, or even experienced ones, are likely not aware of it. This is a very serious problem that should be addressed quickly.
edit: and with websites everywhere routinely providing SSL, it seem crazy that it has to be a paid feature for such a critical service.
Sort-of. That has the additional non-malicious risks. A broken connection turns "rm -r /var/lib/cool/place" into "rm -r /var/" and the shell processes that.
Downloading from HTTP is not an issue (as far as integrity is concerned) if maven were to validate the downloads against some chain of trust. But apparently it is not.
Now I am wondering what tool actually uses those .asc files that I have to generate using mvn gpg:sign-and-deploy-file when I upload new packages to sonatype...
If I understand this correctly, maven based builds can contain dependencies on libraries hosted on remote servers. golang build system has (or had) something similar too. Witnessing this trend take hold is astonishing and horrifying in equal parts. Not just as a security problem (which is clearly obvious) but also a huge hole in software engineering practices.
How can anyone run a production build where parts of your build are being downloaded from untrusted third party sources in real time? How do you ensure repeatable, reliable builds? How do you debug production issues with limited knowledge of what version of various libraries are actually running in production?
Java developers kind of laugh when I explain them that Linux distros struggle to bootstrap Maven from source due to being a non-trivial tool that depends on hundreds of artifacts to build.
The point is, what do you care that your repo is local, or that your jars are secured, if the tool you got maven itself in binary form, from a server you don't control?
That is the whole point of Linux distros package managers. It is not only about dependencies. Is about securing the whole chain and ensure repeatability.
Maven design, unlike ant, forces you to bootstrap it from binaries. Even worse, maven itself can't handle building a project _AND_ its dependencies from source. Why will the rest of the infrastructure be important then?
Yes, Linux distros build gcc and ant using a binary gcc and a binary ant. But it is always the previous build, so at some point in the chain it ends with sources and not with binaries.
And this is not about Maven's idea and concept. If it had depended on a few libraries and a simple way of building itself instead of needing the binaries of half of the stuff it is supposed to build in the first place (hundreds), just to build itself.
"How can anyone run a production build where parts of your build are being downloaded from untrusted third party sources in real time? How do you ensure repeatable, reliable builds?"
By not downloading everything from maven central in real time. Companies usually run their own repository and builds query that one. Central is queried only if the company run repository is missing some artifact or they want to update libraries. How much bureaucracy stands between you and company run repository upgrades depends on company and project needs.
As for production, does anyone compile stuff on production? I through everyone sends there compiled jars. You know what exact libs are contained in that jar, no information is missing.
Hosting your own makes sense for multiple reasons: you can be assured what code you are getting, you aren't limited by bandwidth rates of remote providers, and you get to control up/down time. The first is a must; the second and third make life more tolerable.
In golang-land it is popular to deal with this by vendoring all the packages you depend on. There are several tools to manage this like godep. This is my preferred method as it allows for the reliable, repeatable build you are talking about.
There are other schools of thought, like pinning the remote repos to specific commit-id. These are better than nothing, but still depends on 3rd party repos which I think is to risky for production code. It is great for earlier stages of a project when you are trying to work out the libraries you will use and also need to collaborate.
A couple of years ago we were trying to use BigCouch in a product. The erlang build tool was happy to have transitory dependencies that were just pointing at github:branch/HEAD. It got to the point where we'd build it on a test machine, and then just copy the binary around.
npm has the same problem of sending packages over http, but it's even worse since on average each node package uses about a billion other packages and because injecting malicious code in JavaScript is incredibly easy.
And to be clear, just http here is not the issue. It's http combined with lack of package signing. apt runs over http, but it's a pretty secure system because of its efficient package signing. Package signing is even better than https alone since it prevents both MITM attacks and compromise of the apt repository.
In fact, apt and yum were pretty ahead of their time with package signing. It's a shame others haven't followed their path.
npm by default uses HTTPS, and has for more than 3 years. It's a little confusing because the loglines all say "http" in green, but if you actually look at the URLs being downloaded they are all to https://registry.npmjs.org/
[+] [-] moxie|11 years ago|reply
https://github.com/whispersystems/gradle-witness
It allows you to "pin" dependencies by specifying the sha256sum of the jar you're expecting.
[+] [-] heavenlyhash|11 years ago|reply
If I understand correctly, SHA-256 is part of the SHA-2 family of hash algorithms, and like SHA-1, when used alone it is subject to length extension attacks.
SHA-384 is also a member of the SHA-2 algorithm family, but is immune to length extension attacks because it runs with an internal state size of 512 bits -- by emitting fewer bits than its total internal state, length extensions are ruled out. (Wikipedia has a great table for clarifying all these confusing names and families of hashes: [5].) Other hashes like BLAKE-2 [1], though young, also promise built-in immunity to length-extension attacks. mdm [2] is immune to this because the relevant git datastructures all include either explicit field lengths as a prefix, or are sorted lists will null terminators, both of which diffuse length extension attacks by virtue of breaking their data format if extended.
Not that it's by any means easy to find a SHA-256 collision at present; but should collisions be found in the future, a length extension attack will increase the leverage for using those collisions to produce binaries that slip past this verification. An md5 Collision Demo[3] by Peter Selinger is my favourite site for concretely demonstrating what this looks like (though I think this[4] publication by CITS mentions the relationship to length extension more explicitly).
(I probably don't need to lecture to you of all people about length extensions :) but it's a subject I just recently refreshed myself on, and I wanted to try to leave a decent explanation here for unfamiliar readers.)
--
I'm also curious how you handled management of checksums for transitive dependencies. I recall we talked about this subject in private back in April, and one of the concerns you had with mdm was the challenge of integrating it with existing concepts of "artifacts" from the maven/gradle/etc world -- though there is an automatic importer from maven now, mdm still requires explicitly specifying every dependency.
Have you found ways to insulate gradle downloading updates to plugins or components of itself?
What happens when a dependency adds new transitivity dependencies? I guess that's not a threat during normal rebuilds, since specifying hashes ahead of time already essentially forbids loosely semver-ish resolution of dependencies at every rebuild, but if it does happen during an upgrade, does gradle-witness hook into gradle deeply enough that it can generate warnings for new dependencies that aren't watched?
This plugin looks like a great hybrid approach that keeps what you like from gradle and while starting to layer on "pinning" integrity checks. I'll recommend it to colleagues building their software with gradle.
P.S. is the license on gradle-witness such that I can fork or use the code as inspiration for writing an mdm+gradle binding plugin? I'm not sure if it makes more sense to produce a gradle plugin, or to just add transitive resolution tools to mdm so it can do first-time setup like gradle-witness does on its own, but I'm looking at it!
--
Edited: to also link the wikipedia chart of hash families.
--
[1] https://blake2.net/
[2] https://github.com/polydawn/mdm/
[3] http://www.mscs.dal.ca/~selinger/md5collision/
[4] http://web.archive.org/web/20071226014140/http://www.cits.ru...
[5] https://en.wikipedia.org/wiki/SHA-512#Comparison_of_SHA_func...
[+] [-] notthetup|11 years ago|reply
I am totally happy donating $10 to whisper systems for this work instead of forcing me to donate $10 to Apache Foundation (although a worthy cause) to be able to get https access to Maven Central.
[+] [-] technomancy|11 years ago|reply
Edit: of course, the question of how to determine which keys to trust is still pretty difficult, especially in the larger Java world. The community of Clojure authors is still small enough that a web of trust could still be established face-to-face at conferences that could cover a majority of authors.
The situation around Central is quite regrettable though.
[+] [-] weavejester|11 years ago|reply
[+] [-] akerl_|11 years ago|reply
http://www.obdev.at/products/littlesnitch/index.html
[+] [-] brianefox|11 years ago|reply
[+] [-] ontoillogical|11 years ago|reply
Brian are you speaking as a representative of Sonatype, or are you a 3rd party?
[+] [-] needusername|11 years ago|reply
[+] [-] peeters|11 years ago|reply
[+] [-] brl|11 years ago|reply
[+] [-] heavenlyhash|11 years ago|reply
This is exactly why I built mdm[1]: it's a dependency manager that's immune to cat memes getting in ur http.
Anyone using a system like git submodules to track source dependencies is immune to this entire category of attack. mdm does the same thing, plus works for binary payloads.
Build injection attacks have been known for a while now. There's actually a great publication by Fortify[2] where they even gave it a name: XBI, for Cross Build Injection attack. Among the high-profile targets even several years ago (the report is from 2007): Sendmail, IRSSI, and OpenSSH! It's great to see more attention to these issues, and practical implementations to double-underline both the seriousness of the threat and the ease of carrying out the attack.
Related note: signatures are good too, but still actually less useful than embedding the hash of the desired content. Signing keys can be captured; revocations require infrastructure and online verification to be useful. Embedding hashes in your version control can give all the integrity guarantees needed, without any of the fuss -- you should just verify the signature at the time you first commit a link to a dependency.
[1] https://github.com/polydawn/mdm/
[2] https://www.fortify.com/downloads2/public/fortify_attacking_...
[+] [-] michaelt|11 years ago|reply
With code signing, you can (or hypothetically could, I don't know if anyone does this) check the latest version is signed by the same key as the previous version - whereas just pinning the hash wouldn't allow that.
I agree pinning the hash is useful if the signing key is captured.
[+] [-] femto113|11 years ago|reply
[+] [-] jontro|11 years ago|reply
[+] [-] finnn|11 years ago|reply
[+] [-] MrSourz|11 years ago|reply
The tiff mentioned in the article was interesting to read. > https://twitter.com/mveytsman/status/491298846673473536
[+] [-] avz|11 years ago|reply
[+] [-] danielweber|11 years ago|reply
Freemium models often suck because of stuff like this[1]. But if the "users" would just consider it normal to pay money then we wouldn't have crazy things going on where people providing critical infrastructure services need to figure out how to "convert" their "users." Instead, say, every professional Java shop would pay $100 a year or so for managed access. Projects that want to use it like a CDN so their users could download would pay a fee to host it.
They have bills to pay. They'll cover them one way or the other. If we pay directly at least we know what the game is.
[1] They could be inserting advertising into the jars. Hey, at least it would still be a "free" service, right?
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] ternaryoperator|11 years ago|reply
[+] [-] jimrandomh|11 years ago|reply
I would've much rather had a Makefile. Build scripts and package managers need to be separate.
[+] [-] yourad_io|11 years ago|reply
This. Especially when there's broken links, you're gonna have a bad (and long) time.
[+] [-] taeric|11 years ago|reply
Curious if anyone knows of any well done takes on this. In either way. (If I'm actually wrong, I'd like to know.) (I fully suspect there really is no "right" answer.)
[+] [-] dmacvicar|11 years ago|reply
[+] [-] jc4p|11 years ago|reply
[+] [-] jbaruch_s|11 years ago|reply
So,jcenter is a Java repository in Bintray (https://bintray.com/bintray/jcenter), which is the largest repo in the world for Java and Android OSS libraries, packages and components. All the content is served over a CDN, with a secure https connection. JCenter is the default repository in Goovy Grape (http://groovy.codehaus.org/Grape), built-in in Gradle (the jcenter() repository) and very easy to configure in every other build tool (maybe except Maven) and will become even easer very soon.
Bintray has a different approach to package identification than the legacy Maven Central. We don't rely on self-issued key-pairs (which can be generated to represent anyone, actually and never verified in Maven Central). Instead, similar to GitHub, Bintray gives a strong personal identity to any contributed library.
If you really need to get your package to Maven Central (for supporting legacy tools) you can do it from Bintray as well, in a click of a button or even automatically.
Hope that helps!
[+] [-] sgarman|11 years ago|reply
[+] [-] tdicola|11 years ago|reply
[+] [-] tensor|11 years ago|reply
edit: and with websites everywhere routinely providing SSL, it seem crazy that it has to be a paid feature for such a critical service.
[+] [-] sitkack|11 years ago|reply
[+] [-] clarkm|11 years ago|reply
[+] [-] akerl_|11 years ago|reply
[+] [-] yonran|11 years ago|reply
Now I am wondering what tool actually uses those .asc files that I have to generate using mvn gpg:sign-and-deploy-file when I upload new packages to sonatype...
[+] [-] stevekemp|11 years ago|reply
Did you know that Xine, the media-player, has a similar thing behind the scenes? I didn't
http://blog.steve.org.uk/did_you_know_xine_will_download_and...
[+] [-] pjlegato|11 years ago|reply
How hard would it be to just mirror it to S3 and use it from there via HTTPS?
[+] [-] chetanahuja|11 years ago|reply
[+] [-] dmacvicar|11 years ago|reply
The point is, what do you care that your repo is local, or that your jars are secured, if the tool you got maven itself in binary form, from a server you don't control?
That is the whole point of Linux distros package managers. It is not only about dependencies. Is about securing the whole chain and ensure repeatability.
Maven design, unlike ant, forces you to bootstrap it from binaries. Even worse, maven itself can't handle building a project _AND_ its dependencies from source. Why will the rest of the infrastructure be important then?
Yes, Linux distros build gcc and ant using a binary gcc and a binary ant. But it is always the previous build, so at some point in the chain it ends with sources and not with binaries.
And this is not about Maven's idea and concept. If it had depended on a few libraries and a simple way of building itself instead of needing the binaries of half of the stuff it is supposed to build in the first place (hundreds), just to build itself.
[+] [-] buerkle|11 years ago|reply
[+] [-] watwut|11 years ago|reply
By not downloading everything from maven central in real time. Companies usually run their own repository and builds query that one. Central is queried only if the company run repository is missing some artifact or they want to update libraries. How much bureaucracy stands between you and company run repository upgrades depends on company and project needs.
As for production, does anyone compile stuff on production? I through everyone sends there compiled jars. You know what exact libs are contained in that jar, no information is missing.
[+] [-] SoftwareMaven|11 years ago|reply
[+] [-] eikenberry|11 years ago|reply
There are other schools of thought, like pinning the remote repos to specific commit-id. These are better than nothing, but still depends on 3rd party repos which I think is to risky for production code. It is great for earlier stages of a project when you are trying to work out the libraries you will use and also need to collaborate.
[+] [-] jwhitlark|11 years ago|reply
[+] [-] ShardPhoenix|11 years ago|reply
[+] [-] jnbiche|11 years ago|reply
And to be clear, just http here is not the issue. It's http combined with lack of package signing. apt runs over http, but it's a pretty secure system because of its efficient package signing. Package signing is even better than https alone since it prevents both MITM attacks and compromise of the apt repository.
In fact, apt and yum were pretty ahead of their time with package signing. It's a shame others haven't followed their path.
[+] [-] seldo|11 years ago|reply
[+] [-] GaryRowe|11 years ago|reply
It's available under MIT licence: https://github.com/gary-rowe/BitcoinjEnforcerRules
[+] [-] sitkack|11 years ago|reply
[+] [-] 0x0|11 years ago|reply
[+] [-] sitkack|11 years ago|reply