Backdooring Rust crates for fun and profit

[+] PragmaticPulp|4 years ago|reply

I use “cargo … —-locked” to install things using the dependencies from the Cargo.lock file, which includes specific commit hashes for dependencies. Avoids things like the 0.0.1 problem or even replaced crates. Need to be careful to watch for actual security updates, though.

I really wish crates.io would have at least launched with a name spacing feature. This wouldn’t solve every spoofing or typosquatting issue, but it would go a long way toward improving the situation.

There’s a separate issue of crates.io squatting. One person famously registered hundreds (or thousands? Tens of thousands) of common words as crate names on crates.io and has been squatting them ever since. Those names are effectively unavailable for use but also completely useless because they don’t contain anything.

It’s also becoming a huge problem for abandoned crates. New forks have to choose a completely different, less intuitive name because they can’t just namespace their alternative. As old crates get abandoned, this leads to weird situations where the newest, best maintained crate has the least obvious crate name. It takes work to find the good crate some times because the best named crates might just be the oldest, most abandoned ones

[+] mikepurvis|4 years ago|reply

I feel like "single shared namespace" works best when the ecosystem is managed collaboratively, with a clear leadership to make calls on who gets what name— for example, something like a Linux distro, where there are Replaces/Provides metadata specifically to facilitate these kinds of transitions and avoid being stuck forever with crappy legacy nonsense.

But this doesn't work at all in a free-for-all environment like PyPI, NPM, or Crates, where anyone can just grab a name and then have it in perpetuity.

IMO the Docker ecosystem got this right, with baking in a domain name as part of the container, and insisting that everyone on docker.io use a vendor/product convention. This meant that the toplevel namespace was reserved for them to offer (or delegate the offering of) specific blessed container images, much more in line with how distro packaging might work.

    # Get the "official" image, whatever that means (but you trust Docker Inc, so yay).
    docker pull nginx

    # Get an image supplied by a specific vendor.
    docker pull bitnami/nginx

    # Get an image from a different server altogether; maybe it's your company, or you don't trust Docker Inc after all?
    docker pull quay.io/jitesoft/nginx

Maybe the big-flat-namespace thing is still a years-later reaction against huge and unnecessary hierarchies in Java land? I think the ideal is not to permit infinite depth, but perhaps to insist on 2-3 levels.

[+] moojd|4 years ago|reply

This one is frustrating because this is an issue that has been solved many times before and I hate seeing it repeated in every new package manager. A vendor name should always be required and the top level should be reserved for official/standard packages.

I want all of the following from a package manager:

1. Required vendor/namespace for third party packages

2. No multiple package versions. If there is a version conflict between transitive dependencies of a package because of semver, you should not be able to install that package.

3. Lock file and a separate 'install' command for installing the locked versions and an 'upgrade' command for updating versions via semver

4. Upgrade command should support a --dry-run option that lists the packages and versions that are to be updated and a --diff that lets you preview the code changes.

[+] belter|4 years ago|reply

I think if somebody decides to act as a vandal I can too?

https://crates.io/users/swmon

Edit: Lots of love for this user in the thread....

https://github.com/swmon/Charles-Crack/pull/1

[+] rectang|4 years ago|reply

My understanding is that validating identity for package authors is a hard problem thus expensive to solve robustly, and the crates.io folks have hitherto deferred tackling it in earnest. That is arguably a responsible approach up to a point, in that they haven't committed prematurely to something half-baked.

As described elsethread, there is prior art — Maven's identity verification is substantially better: https://news.ycombinator.com/item?id=29266591

Validating ownership of a namespace reliably enough that it is difficult to spoof is tough. It's possible for PGP creds to be stolen. But then at least the keys can be revoked, and old packages signed with a new key.

[+] ziml77|4 years ago|reply

The lack of namespacing in crates.io has always been a strange decision to me. I hope at some point they decide that it's worth it to introduce namespaces. The could use some an aliasing mechanism to avoid breaking any existing package references.

[+] Lifelarper|4 years ago|reply

https://users.rust-lang.org/t/name-squatting-on-the-crates-i...

This a very old discussion if you keep digging through the links.

[+] nine_k|4 years ago|reply

It's heartbreaking to see crates.io repeat nearly every mistake which npm has committed. I did not expect that from the Rust crowd which usually tends to design things carefully and with correctness and corner cases in mind.

[+] dathinab|4 years ago|reply

> crates.io matches the code on GitHub,

There is no tight coupling between GitHub and cargo/crates.io (sure it uses GitHub internally but that is an implementation detail).

But not only has it no tight coupling with GitHub it also doesn't require you to use git, you can use whatever version control you want and at worse you don't get support for "detect dirty repository".

Similar git tags are fundamentally unreliable as you can always "move" some to any arbitrary commit.

So IMHO the problem here is relying on code you didn't got from github which might not even use git to match a arbitrary tag on something on github which might not even be from the same author (but e.g. a mirror on GitHub from whatever VC the author uses).

But uploads to crates.io are immutable and are source code uploads, so you can just review them.

In general (independent of cargo) do review the code you use not some code you got form somewhere else which you hope/believe is the same.

[+] PragmaticPulp|4 years ago|reply

> There is no tight coupling between GitHub and cargo/crates.io (sure it uses GitHub internally but that is an implementation detail).

Cargo had a “locked” option that uses the URL and commit hash from the Cargo.lock file. If the crate, repository, or commit has changed then it won’t build.

This is what everyone uses for reproducible and secure builds, but it’s not as commonly used for casual use.

[+] rectang|4 years ago|reply

Treating PGP signed commits as privileged and only pointing at them as opposed to mutable tags seems like it would help.

[+] baby|4 years ago|reply

That's why I started https://github.com/mimoo/cargo-dephell and https://github.com/diem/whackadep btw, to try to get a sense of the risk in our Rust dependencies. Whackadep runs a a webserver that updates periodically and shows you what's up with your dependencies. If there's a new update, the idea is to estimate a sense of urgency (are we lagging behind? Is it a major version change? Is there a RUSTSEC advisory?) and a sense of risk. For example, it'll tell you if a new dependency version touches a `build.rs` file. I wanted to add more rules, like the ones mentioned in the article, but never had the time to do it.

[+] Diggsey|4 years ago|reply

There's going to be a risk to running someone else's code. There are two factors here:

1) Do I trust the code I think I'm running. 2) Am I actually running the code I think I'm running.

With (1) there's not really any way around it: someone or something has to review the code in some way.

Even the suggestion to have a larger standard library doesn't really address it: with a larger standard library the rust project needs more maintainers, and it might just get easier to get vulnerabilities into the standard library.

Someone could build a tool that automatically scans crates uploaded to crates.io. It could look for suspicious code patterns, or could simply figure out what side-effects a crate might have, based on what standard library functions it calls, and then provide that information to you. For example, if I'm looking for a SHA256 crate, and I notice that the crate uses the filesystem, then I might be suspicious.

With (2) there are some easier options, such as making it easier to download or browse the contents of a crate directly from crates.io, or have a tool to show the full dependency source diff after a `cargo update`. For initially installing the crate, the number of downloads is a pretty good indicator of "is this really the crate I meant?".

[+] pdimitar|4 years ago|reply

There's zero backdooring involved anywhere in this article. His most convincing argument seems to be "if your account as a package maintainer is hijacked then bad things could happen" -- well yeah, thanks for the insight Sherlock.

I'd be genuinely excited to read objective and deeper analyses of the Rust ecosystem in which I am looking to invest myself further. I want to know what exactly I am getting involved with so I'd welcome any good criticisms of it.

But not click-baity articles with almost zero substance inside. He's basically repeating old lists of risks of human error.

[+] jynelson|4 years ago|reply

> While it’s possible to audit the code of a crate on https://docs.rs on clicking on a [src] button, it turns that I couldn’t find a way to inspect build.rs files. Thus, combined with a malicious update, it’s the almost perfect backdoor.

Docs.rs has its own source view on /crate that's separate from rustdoc's. For example, you can see the build.rs for boring-sys on `https://docs.rs/crate/boring-sys/1.1.1/source/build.rs`.

[+] richardwhiuk|4 years ago|reply

You can also download the crate directly from crates.io

[+] thorum|4 years ago|reply

I’m increasingly fatalistic about computer security. It seems like your options are carefully auditing all dependencies (difficult and maybe impossible if the dependencies are highly technical or the malicious code is sufficiently subtle or obfuscated) every time you update, not updating at all (which leaves you vulnerable to all the bugs and other security issues in the version you choose to pin), or not using dependencies at all (by spending months or years totally rewriting the libraries and tools you need, and of course your own code will have bugs too).

Fixing the points addressed in this article helps by making it harder to slip these backdoors in, but will never be foolproof unless every single library has a maintainer with the skills to detect subtle bugs and security issues, who audits every line of code.

Even then the marketplace for unreported zero day vulnerabilities means that there are probably undiscovered vulnerabilities somewhere in your dependencies (or in the code for your IDE or OS or Spotify app or mouse driver...) that can be exploited by someone.

I’m reminded of the Commonwealth series by Peter Hamilton, in which the invading aliens have no machines, and quickly discover that ours are full of bugs that can be exploited to turn against us. I don’t know what the solution is. Sandboxing your development in a codespace like Gitpod is a big improvement for sure, but even in Gitpod a lot of people import credentials and environment variables that can be stolen. (And what dependencies is Gitpod itself running?)

[+] froh|4 years ago|reply

I think we have, as an industry, for a long time not seen the true value proposition of "Linux distributions". They do quite some boring and tedious security auditing, for example review setuid binaries to the point they drop from root into user privileges; and they backport security patches, so security updates are binary compatible drop in replacements.

When a binary distribution is widely used, the beneift is shared bug fixing and hardening, the disadvantage is somewhat dated libraries.

It's a model I understand.

What I don't understand is this idea of bootstrapping infrastructure via curl https://..../setup.sh && ./setup.sh, and the equivalent import of "modules", whatever you call them in your language of choice, straight from the web.

[+] throwawaygh|4 years ago|reply

> carefully auditing all dependencies (difficult and maybe impossible if the dependencies are highly technical or the malicious code is sufficiently subtle or obfuscated)

...yeah, a business is responsible for the integrity of its supply chain. There's nothing fatalistic about this. Running a business with potential liabilities is different from having a high school programming hobby.

If you're using community distributions of open source software in a security-critical context (e.g., any machine that touches PII) then you should absolutely white-list dependencies and either (1) have internal auditing mechanisms in place for those dependencies or else (2) have good reason to trust the QA procedures of the underlying community (and still do some basic auditing on every update anyways).

Everything else should be carefully sand-boxed and basically assumed to be pwned/pwnable.

If some rando came up to your contractor and offered them free concrete for use in your foundation, and the contractor said yes without any due dilligence, you would have every right to sue that contractor out of existence.

The www isn't a wild west anymore. The era where any middle schooler can build a six figure business by serving as the middle man between open source packages and end-users should probably come to a close. And I say that as someone whose middle school software freelancing business cleared lots of revenue by the end of college.

I wonder if this could be a revenue model of OSS. Cyber insurance providers should probably stat weighing in on these supply chain issues soon.

[+] Groxx|4 years ago|reply

I'm gonna pimp my own complaint here: https://news.ycombinator.com/item?id=29125409

I think library permissions systems would mitigate or effectively eliminate a huge amount of these, and significantly raise the cost or reduce the targets of nearly all attacks.

Libraries are, in practice, treated as black boxes. I think that's largely reasonable - that's almost the whole point of leveraging someone else's work. But our languages/etc do not allow doing that in any sane way. I think that's completely ridiculous.

[+] worik|4 years ago|reply

But there it can be so much better, or so much worse.

Worse is Node.js (not bothering with the unanswerable question: Why Node.js?) thousands and thousands of package downloads. Long chains of transitive dependencies. A long storied history of security/reliability catastrophes.

I love Rust. But I have always thought having the compiler download dependencies is a very bad idea. It would be much better if the programmer had to deliberately install the dependencies. Then there would be an incentive to have less dependencies.

This is currently a shit show, because it is easier to write than read, to talk than listen. New generations of programmers refuse to learn the lessons of their forebears and repeat all their mistakes, harder, bigger, faster

[+] remus|4 years ago|reply

> I’m increasingly fatalistic about computer security. It seems like your options are carefully auditing all dependencies (difficult and maybe impossible if the dependencies are highly technical or the malicious code is sufficiently subtle or obfuscated) every time you update, not updating at all (which leaves you vulnerable to all the bugs and other security issues in the version you choose to pin), or not using dependencies at all (by spending months or years totally rewriting the libraries and tools you need, and of course your own code will have bugs too).

There is also the option of having trusted third parties review code. This is by no means an easy option but it does seem more feasible than everyone auditing every line of code they ever depend on. You do end up with spicy questions like who do we trust to audit code? Why do we trust them? How are they actually auditing this code?

[+] panny|4 years ago|reply

>It seems like your options are carefully auditing all dependencies

"given enough eyeballs, all bugs are shallow"

Rust has an eyeball problem. Not saying this solves everything, but it certainly isn't helping. NPM has had several catastrophic security failures, but they all seem to be noticed rather quickly due only to the staggering number of people using JS.

[+] _tom_|4 years ago|reply

I think that the rise of automated dependency resolution tools, like maven, has made this exponentially worse. It's routine for tools to to have hundreds or even thousands of dependencies, something that would never happen if you manually had to manage them.

My .m2 directory has 4000+ jar files in it, for example.

They make you more productive, but much more vulnerable.

[+] rectang|4 years ago|reply

Is there a way to buy into PGP identity-based controls for crates.io packages? To say, "I trust the keys in this whitelist, so trust packages signed by those keys."

> Thirdly, using cloud developer environments such as GitHub Codespaces or Gitpod. By working in sandboxed environments for each project, one can significantly reduce the impact of a compromise.

That's appealing but expensive. I wish I could effectively sandbox a local developer machine. External boot drives, maybe?

[+] jrochkind1|4 years ago|reply

Most of these are common to other platform packaging systems, and I'm not sure I've seen any especially interesting solutions to them.

The macro-based ones are rust-specific and seem especially devious and challenging to me.

[+] cntlzw|4 years ago|reply

How do rust crates compare with something like maven or npm? It looks like some issues for example Typosquatting can be done in all of these dependency managers.

[+] Ajedi32|4 years ago|reply

Yep, supply chain attacks are a near-universal problem with programing language package managers.

I think there's a lot of room for improvement here. Some good low-hanging fruit IMO would be to:

1. Take steps to make package source code easier to review.

1.1. When applicable, encourage verified builds to ensure package source matches the uploaded package.

1.2. Display the source code on the package manager website, and display a warning next to any links to external source repositories when it can't be verified that the package's source matches what's in that repo.

1.3. Build systems for crowdsourcing review of package source code. Even if I don't trust the package author, if someone I _do_ trust has already reviewed the code then it's probably okay to install.

2. Make package managers expose more information about who exactly you're trusting when you choose to install a particular package.

2.1. List any new authors you're adding to your dependency chain when you install a package.

2.2. Warn when package ownership changes (e.g. new version is signed by a different author than the old one).

Long-term, maybe some kind of sandbox for dependencies could make sense. Lots of dependencies don't need disk or network access. Denying them that would certainly limit the amount of damage they can do if they are compromised, provided the host language makes that level of isolation feasible.

[+] tetha|4 years ago|reply

Maven Central is somewhat resilient against this. In the java world, an artifact is identified by a group-id, an artifact-id and a version, and some technical stuff. The group id is a reversed domain, like org.springframework.

If you want to upload artifacts with the group id "org.springframework", you first have to demonstrate that you own springframework.org via a challenge, usually a TXT record or some other possibilities for github group-ids and such.

It's not entirely bulletproof, because you could squat group-ids "org.spring" or "org.spring.framework" (if you can get that domain). However, once a developer knows the correct group id is "org.springframework", you need additional compromises to upload an artifact "backdoor" there.

Edit - and as I'm currently seeing, PGP signatures are also required by now.

[+] typicalbender|4 years ago|reply

I haven't thought this through at all but are you aware of any package repositories that do something like levenshtein distance between package names maybe combined with a heuristic on common mistyped characters to not allow typosquatting?

[+] ChrisSD|4 years ago|reply

These do all seem to be things that apply to most package managers of this kind. So it would be good if Rust could find solutions that can be applied more broadly.

[+] junon|4 years ago|reply

npm has some guards for typosquatting. They're annoying when you run into them but I appreciate that they're there. I have no idea how effective or extensive they are, though.

[+] peterth3|4 years ago|reply

Discussion on /r/rust about this article:

https://www.reddit.com/r/rust/comments/qw3w01/backdooring_ru...

[+] verdverm|4 years ago|reply

I wrote https://verdverm.com/go-mods/ to talk about ways Go avoids some of these pitfalls. The forethought that went into `go mod` is one of the reasons I like and trust Go

[+] Groxx|4 years ago|reply

I only see one that it avoids: domain names / URLs as import paths makes ownership much more clear, and slightly harder to achieve typo-squatting... sometimes. And I do very much like this part of go modules, it also helps decentralize the whole system a fair bit. I sincerely hope it becomes the dominant package-name strategy in time.

But lets pick another that seems on the surface pretty likely to be mitigated: source for downloaded-version X not matching version X repo's source, under "Malicious update" with cargo's `--allow-dirty`. After all, goproxy pulls from git repos directly, right? There's no --dirty flag or anything to push random garbage.

That's still a problem! Git tags are mutable, as are git repositories as a whole. You can absolutely tag a malicious version, get it into goproxy, and then change or remove the tag and any associated commits. The goproxy doesn't even store the SHA for correctly-tagged versions, only the code and a checksum of the code it saved, so finding the commit that it originally pointed to can be difficult or impossible. You can download the module and read the code from that, but that's true of any non-binary dependency system. You can't publish a change to an already-published version, but that's true of cargo too (afaik) as well as most package hosts (afaik), though goproxy takes a minor technical step further to make that accident-resistant (or at least easily detectable. which is great, everyone should do that).

[+] steveklabnik|4 years ago|reply

A tremendous amount of forethought was put into Cargo and Crates.io. The difference is that many folks look at the same problems and come to different conclusions about what to do to, not negligence.

[+] caffeine|4 years ago|reply

Seems like you could address with a super-crate that includes "trusted" crate releases as "features"

That crate could involve some automation like:

* Checking that the code in the crate matches the code in Github

* Checking whether the latest commit is from a new committer, or whether there is any code comitted by a user not in a whitelist,

* Checking whether the package has any known security advisories

* Checking that crate signatures match some whitelist

* Running a project that includes the crate in a sandbox and seeing whether there are any files accessed, network accesses, etc. that were not pre-whitelisted

New versions of included crates would have to go through this battery of checks before they get bumped in the super-crate.

Crates that want to be included as features of super-crate or that need to change/add significant functionality, or add dependencies, would need to make a PR to update the relevant whitelists, which could then be reviewed by the super-crate team

[+] epage|4 years ago|reply

This has come up several times in the past. One name for it was stdx.

Some in the ecosystem are very cautious of picking winners and losers, limiting the exposure to new break-out crates. Rarely recommending crates for different problems. This comes at the cost of making it a harder barrier to get involved because you need to be "in the know" for what crates to use or avoid.

Another problem with stdx is if anyone uses types from this in their public API, they are decoupled from the individual crates semver constraints which makes it hard to know which breaking changes from your dependency are a breaking change in your API.

[+] lillecarl|4 years ago|reply

Nobody is mentioning C#, but my experience there is that I rely on a lot less dependencies and a rather big standard library from Microsoft.

Microsoft has been splitting the standard library into separate dependencies now, but they're still maintained by them and I feel safe depending on them.

[+] Lifelarper|4 years ago|reply

> I’m not sure if it’s by bots or real persons

The bot usage is a significant amount of the low level noise, I've published things of no use to anyone and they always rack up a lot of dl's despite no one practically using them for a long time.

> Firstly, a bigger standard library would reduce the need for external dependencies

There's years worth of the same arguments tiringly made over and over again (same with namespacing) on the rust forum, everyone has played their hand on this issue a dozen times now, the community clearly has a majority stance on such things.

> A variant of the previous technique is to use the --allow-dirty flag of the cargo publish command.

Please correct me if I'm wrong but thought that flag simply allows uncommitted changes to be published, the source is still availabile for anyone to view on crates.io

> We're sorry but this website doesn't work properly without JavaScript enabled. Please enable it to continue.

Works perfectly fine for me. Maybe you couldn't serve me a gdpr or something. Thankfully I can keep it turned off for now :)

[+] loeg|4 years ago|reply

You don’t really need the ‘—allow-dirty’ flag to do as the author claims. There’s no enforcement that the local git commit is ever published to a public repo.

[+] pornel|4 years ago|reply

https://lib.rs/cargo-crev tries to address this.

It allows you to review the actual published source of your dependencies. It then can check whether your project only uses reviewed dependencies.

Reviewing everything is of course a lot of work, so there’s an option to mark crate owners as trusted, and also reuse code reviews made by people you trust.

[+] mullr|4 years ago|reply

What are people doing about this on the client side? The solution that comes to mind is to do all my Rust builds in a sandbox of some kind, but with rust-analyzer involved, I'd likely have to put my editor in there as well.

[+] gpm|4 years ago|reply

There's some work towards moving the scarier parts of rust builds (e.g. procedural macros, that run arbitrary code) into a wasm-based sandbox. E.g. [1]. Obviously doesn't make the final artifacts safe to run though, and I also wouldn't trust LLVM to have no bugs exploitable by feeding it bad code, but at least it would raise the bar.

[1] https://github.com/dtolnay/watt

Edit: And someone on reddit brought up vscode's dev containers [2], to move everything into docker. Obviously docker isn't really a security sandbox, but again it raises the bar.

[2] https://code.visualstudio.com/docs/remote/containers

[+] zelos|4 years ago|reply

> How to protect?

> By pinning an exact version of a dependency, tokio = "=1.0.0" for example, but then you lose the bug fixes.

Surely no one uses version ranges in production? Is the default really not to use an exact version for crates?

[+] steveklabnik|4 years ago|reply

The default is to declare ranges, but then you get a lockfile after an initial build, and Cargo will use those exact versions until you ask for changes.

[+] Macha|4 years ago|reply

The default is to use ^x.y.z so it'll pull in patch versions.

[+] gspr|4 years ago|reply

I love Rust, but it's particularly frustrating to me how it (nearly) forces you to use a single centralized package repository. Shameless plug for an issue: https://github.com/rust-lang/cargo/issues/10045

I don't have a gripe with crates.io per se, but that the tooling doesn't work offline and works poorly without crates.io is a real problem.

201 comments