Typosquatting programming language package managers

[+] wbond|9 years ago|reply

We've gotten flack from package developers submitting new packages to Package Control [0] because all additions to the default channel are hand reviewed. Part of this process is to prevent accidentally close package names, to try and encourage collaboration and to encourage developers to actually explain what their package does and how to use it.

My hope is to be automating a large amount of the review in the next few months, however I think this is a good argument for never having it be fully automatic. Having a human sanity check submissions isn't a terrible idea if we can keep the workload down.

Certainly this doesn't prevent a malicious author from posting a legitimate package and then changing the contents to be malicious, but that can be somewhat solved by turning off automatic updates.

[0] https://packagecontrol.io

[+] SCdF|9 years ago|reply

Hey Will,

Thanks for keeping Package Control high quality, I know it's highly appreciated :-)

[+] bcg1|9 years ago|reply

Sonatype has a manual review process as well before allowing new projects to deploy to Maven Central. [1][2]

One step to mitigate things like this as well would be to have some sort of "crowd-sourcing" command in the package manager program... like "npm flag coffe-script" or something like that to alert repository maintainers of possible issues.

[1]: http://central.sonatype.org/pages/ossrh-guide.html [2]: http://central.sonatype.org/articles/2014/Feb/27/why-the-wai...

[+] yxhuvud|9 years ago|reply

Typosquatting can be flagged automatically (for reviewal of a human later) using Levenshtein distance.

[+] eudox|9 years ago|reply

Keep fighting the good fight.

[+] mercer|9 years ago|reply

> Certainly this doesn't prevent a malicious author from posting a legitimate package and then changing the contents to be malicious, but that can be somewhat solved by turning off automatic updates.

Perhaps you could make this safer by adding an automatic check for how much the package has changed since the last version? And at least warn the user when they want to update?

[+] hoodoof|9 years ago|reply

Perhaps list all new packages to the community and require/request validation or flagging by the community, along with listing similar package names.

[+] unknown|9 years ago|reply

[deleted]

[+] Mahn|9 years ago|reply

> In the thesis itself, several powerful methods to defend against typo squatting attacks are discussed. Therefore they are not included in this blog post.

http://incolumitas.com/data/thesis.pdf section 5 "Practical implications". Just wanted to point out that in case you skipped it it's worth a read, some interesting proposals there that are worth discussing with package manager maintainers.

I particularly like the preemptive approach of auto-blacklisting common typos by simply monitoring the number of times a specific unexisting package is requested over time (5.10). So if a lot of people regularly attempt to install the unexisting package "reqeusts", it could signal that it's a common typo and should be blacklisted to prevent malicious use in the future. False positives could always be sorted out manually by communicating with the package manager maintainers.

[+] zeveb|9 years ago|reply

Reminds me of the quote, 'there are only two hard things in computer science: naming things, cache invalidation and off-by-one errors.'

I think that this clearly falls under the heading 'naming issue.' People know what they want, but do not enter it properly.

I can't think of a 100% off-hand, which isn't surprising, because it's a hard problem.

pmontra's suggestion to use typo blacklisting ain't a bad idea. Maybe some sort of reputation-per-name could help?

[+] a_t48|9 years ago|reply

Sure it's not an off-by-one[-key] error? :)

[+] blowski|9 years ago|reply

Banks have a similar problem when people write cheques or set up standing orders. You have to put a name and the account number.

I wonder if you could do something similar here - enter the name of the package and a code of some sort. I haven't thought this through in a lot of detail.

[+] szx|9 years ago|reply

When you think about it, how different is the destructive potential of an npm/pip install from curl | bash that (some) people tend to froth at the mouth about?

It's pretty mind blowing how big of a blindspot package installers are. I guess running everything inside a e.g. Docker container/VM would be a partial interim solution for the paranoid?

[+] lmm|9 years ago|reply

> When you think about it, how different is the destructive potential of an npm/pip install from curl | bash that (some) people tend to froth at the mouth about?

It's a bit better - there is only one possible source of compromise rather than everyone on the network path. Given that npm/pip likely keep archives of all packages uploaded, it would be much harder (perhaps impossible) to attack someone secretly this way, at least in the long term.

Good package managers require signing of uploads (e.g. maven central requires every package to have a GPG signature; Debian goes further, and requires your key to be signed by an existing member of the organization). If the client checks the signatures you end up with a system that's perhaps actually secure.

[+] raesene10|9 years ago|reply

For me they're very similar. I actually did a talk last year for OWASP AppsecEU where I started with the curl|bash bit and pointed out where rubygems/npm etc aren't really a lot better in some ways

https://www.youtube.com/watch?v=Wn190b4EJWk

[+] eudox|9 years ago|reply

I'm a fan of the approach of personally submitting projects to the repository maintainer (e.g. through GitHub issues), and having the maintainer personally approve them.

It does raise the barrier to entry, but it would prevent typosquatting and regular namesquatting.

EDIT: Does any major package manager provide a "did you mean" functionality, offering a list of actual package names similar to what you typed?

[+] akavel|9 years ago|reply

then the maintianer must have perfcet sigth and never ovrelook even one tpyo :]

and then also have perfect memory of all packages and notice that similarly named package is too (for some value of "too") similarly named to some already existing one... even if e.g. both are a correct dictionary word.

[+] baby|9 years ago|reply

After watching this awesome Defcon talk https://www.youtube.com/watch?v=YqxaKGA9Lnc I wondered if there was any use cases for bit/typo squating in crypto. This is a pretty cool one! Not crypto but interesting none-the-less :)

[+] pmontra|9 years ago|reply

Probably the maintainers of the package managers know which typos their users do, because of the 404s in the logs or equivalent errors. A preventive action could be starting to blacklist any name resolving to 404. If somebody eventually tries to upload a package in the blacklist, a maintainer should check the code and whitelist the name. Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

[+] utexaspunk|9 years ago|reply

Might it work to mandate that the name of an uploaded package have a minimum levenshtein distance (or similar calculation) from the names of all the existing packages? Then you wouldn't have to worry about maintaining a blacklist.

[+] hughes|9 years ago|reply

Surely some troll would deploy a fleet of machines that flood package indexes with requests to available names, effectively blacklisting entire dictionaries and eventually all short names.

[+] epalmer|9 years ago|reply

> Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

I see what you did.

[+] Mizza|9 years ago|reply

This seems like pretty unethical research to me.

Also, doesn't point out that the bigger threat is that this is wormable.

[+] tantalor|9 years ago|reply

The doc (http://incolumitas.com/data/thesis.pdf) does have a short section on ethics, but IMHO it completely misses the point of the ethical concerns in running unauthorized, non-sandboxed code on devices you don't own. Instead it justifies the research by saying the threat cannot be shown unless the vulnerability is exploited, which is true, but that fact does not justify the research.

The acknowledgements mention 2 of the university advisers and a PyPi admin consented to the "notification program".

Still, people with good intentions have been prosecuted and convicted for less. I would be very concerned for this student.

[+] throwawaysocks|9 years ago|reply

There was no actual intrusion, so this feels like fair game to me. Especially since mitigating a very possible attack vector is a direct result of running experiment. Still, hopefully the researchers got an IRB to sign off on the experiment setup...

[+] SolarNet|9 years ago|reply

Yea, this would never get past my university's ethics department. I'm actually surprised he was allowed to do this. Maybe it's partially due to the fact our ethics department is also worried about liability.

[+] markbnj|9 years ago|reply

Perhaps this could have been made cleaner by relying on the package manager for download counts only, and then demonstrating the code execution scenario on research machines only. If you wanted to avoid actually downloading anything to the user's machine (after all, they expect a 404 in this case, not a package even be it a harmless one) you'd perhaps need the cooperation of the repo admins to a greater extent.

Anyway, this is all part of why I always try to build inside a container, or at least in a virtualenv where I don't need to sudo the install.

[+] placeybordeaux|9 years ago|reply

Yeah I wouldn't want to find myself in court hearing

>17000 computers were forced to execute [unauthorized] arbitrary code

Certainly a crime in the US, not sure about Germany.

Nice execution though!

[+] 21|9 years ago|reply

I wonder about the legality. It looks to me like he isn't technically responsible, since he didn't access any authorized computer himself.

If I intentionally leave an infected USB drive on the ground, someone picks it up and sticks it into it's computer, am I liable?

Seems like it could go either way.

[+] PeterisP|9 years ago|reply

Part of the problem is the many packages that require sudo permissions to install - IMHO that should be an exceptional case, but it isn't.

[+] nneonneo|9 years ago|reply

Packages often require sudo in order to install to the global interpreter - it's a security hazard otherwise. Imagine a Python package which overrides the sys module. If it didn't require sudo, anyone could install it and compromise Python for everyone else (or, for instance, compromise setuid programs).

The two solutions here are user-local packages (pip --user, for example) and virtual environments.

[+] cormacrelf|9 years ago|reply

And 'npmjs.org' is misspelled as 'npmsjs.org' in the introduction. Nice.

[+] nichochar|9 years ago|reply

Wow, this a very good study and explanation of what typo squatting is, and I really liked how he proved it's effectiveness.

I wonder what kind of steps we can take to prevent this risk.

[+] ysavir|9 years ago|reply

Instead of blacklisting, why not respond with a "You requested package ABD, but we think you might mean package ABC. Enter 'yes' to continue or anything else to start over."

That way authors can continue to use any name they want, and the emphasis is on letting installers know that they might be installing the wrong package.

[+] zmanian|9 years ago|reply

We need operating system vendors to give us a mechanism for easily creating and managed sandboxed dev environments.

Ones dev environment should be a place where remote code execution is a high probablity and we need better tools to partition that from high value data.

[+] airless_bar|9 years ago|reply

This only seems to be an issue for languages where packages reside in a global namespace, like Python, Rust etc.

I think most languages these days are a bit smarter and avoid this beginner mistake (for various reasons).

[+] bennofs|9 years ago|reply

Did anyone else find it surprising the the number of total requests (45334) is so much higher than the number of unique total requests (17289)? It is more than twice the number of unique requests!

Possible explainations:

* Perhaps many of those are automated build systems, which would also explain the high number of systems with admin access (for example, if you use travis without docker, every build runs in a clean vm with admin access).

* People download one package and install it multiple times? Seems unlikely

Any other ideas?

[+] Guillaume86|9 years ago|reply

I think he forgot to define a baseline (could be wrong, I didn't read the paper). He should have generated a few packages with a completely innocent name (and maybe some packages with just a GUID as a name) to see how much downloads / installs they get too.

[+] lighttower|9 years ago|reply

The person who ran the line,

sudo pip install lumpy (instead of numpy)

Ran it again because it 'didn't work'

[+] cderwin|9 years ago|reply

In the case of python (not sure about the other package managers) if a valid package requires the hacked package, each project that requires that valid package will download and install the hacked package separately if you're using virtual environments. Also if you're using docker you reinstall everything when your requirements file changes.

[+] caseysoftware|9 years ago|reply

Numerous developers and/or building multiple servers behind a single IP address aka NAT. It's pretty common.

[+] joepvd|9 years ago|reply

Automated testing, continuous integration/delivery, et cetera download and install packages pretty often. If the type is made in the requirements.txt or package.json or what have you, the error can be repeated very often up to and including production.

[+] unknown|9 years ago|reply

[deleted]

[+] mirekrusin|9 years ago|reply

with npm there should be at least an option which prompts for Y/N/A when package has preinstall hook.

but even this just tries to put the problem under carpet. you could still for example have requests package which just installs request package, works as expected, just sends request/response to your own server from time to time. ie. when there's http basic auth used only.

[+] seldo|9 years ago|reply

It is possible to disable install hooks at install time by running npm install with --ignore-scripts.

You can also make this the default, with npm config set ignore-scripts true (and then --ignore-scripts false at install time if you wish to run them).

[+] zanchey|9 years ago|reply

Solaris did (does?) this - "this package contains installation scripts which run as superuser" or words to that effect. Unfortunately I never found a owa to inspect the scripts directly so it wasn't all that helpful.

[+] mbroshi|9 years ago|reply

Maybe this is overly naive, but when I make a typo in the Google search bar, it doesn't even search for my typo-ed term (even if it would have gotten some hits), it searches for what I actually meant to type. Can't package managers have a similar feature?

[+] ryanmarsh|9 years ago|reply

So last week my client discovered there's a gem named bunlder... sigh

[+] jogjayr|9 years ago|reply

I thank my stars every time I get a "Package not found" error due to a typo, because I'm reminded that it could have been much worse.

[+] jwilk|9 years ago|reply

Trying to parse the title made my head hurt. It should be "Typosquatting software package names" or something.

143 comments