A web of trust for NPM

[+] ratww|5 years ago|reply

I will die on that hill, so here it goes again: The problem with NPM is not the amount of runtime dependencies.

Most Javascript projects would actually fare pretty well when compared to other languages if only runtime dependencies were taken into account.

Javascript staples like React, Vue, Svelte, Typescript and Prettier actually have zero runtime dependencies. Also, the ES6 standard library is not as bad as people claim.

The real problem is with development dependencies. The amount of dependencies required by Babel, Webpack and ESLint are the cause for 99% of the dependency bloat we complain about in JS projects. Those projects prefer to have monorepos, but when it's in your machine they're splitted into tens or hundreds of dependencies. Also remember that left-pad was only an issue in 2015 because a Babel package required it. If those projects were able to get it together we wouldn't even be having this conversation. Solve this and you'll solve the biggest complaints people have about JS.

I really would like to see a discussion on this, as most people seem to put a lot of blame on JS as a whole, while it's mostly a handful of popular projects generating all the complaints.

[+] preommr|5 years ago|reply

> I really would like to see a discussion on this, as most people seem to put a lot of blame on JS as a whole, while it's mostly a handful of popular projects generating all the complaints.

Discussion on what? You're right.

I know we joke about left-pad, but like you pointed out, a lot of big js hitters don't have many if any dependencies. That's true, but irrelevant.

Those dev-dependencies are still potential security threats, with all the minification and other crap, it's really hard to know what gets injected into the final runtime. And if not security, it's still development hell. Development, yes, but that's a if not the thing that programmers really care about.

And even if runtime dependencies are less common, there are a lot of developers that still do ascribe to using as many deps as possible especially because the web can be quite fragmented and they have to support a myriad of different target platforms. So even if it's a lesser issue, I think it's fair to talk about the js ecosystem as a whole when making criticisms about it's dependency disasters.

[+] ohazi|5 years ago|reply

Development dependencies are especially terrifying because we generally don't use any sort of sandboxing. Any one of these dependencies could append to .bashrc to get your computer to run literally anything, and then hide the evidence.

And developer machines are particularly juicy targets because they often have ssh keys to production machines lying around.

[+] fn1|5 years ago|reply

> If those projects were able to get it together we wouldn't even be having this conversation. Solve this and you'll solve the biggest complaints people have about JS.

So the way to go would be

1. compile a list of opensource-dependencies by usage/complexity

2. go through their dependencies

3. find easy fixes and create PRs for the projects.

[+] striking|5 years ago|reply

> Solve this and you'll solve the biggest complaints people have about JS.

Or, perhaps, we can just tell people that their complaints are nonsensical when most of the deps are not actually packaged into the final app or being used at runtime, instead of contorting ourselves to fit their silly worldview.

I'm not losing any money over this. Why should I care?

[+] andreareina|5 years ago|reply

> the ES6 standard library is not as bad as people claim.

I don't know how bad some people claim it is, but I claim it's pretty bad when it lacks basics such as string formatting, str{f,p}time, others that I can't bring to mind right now but I'm sure I'd find if I trawled through my old code.

[+] korijn|5 years ago|reply

So, assuming you're right, what might be a strategy to push these projects to a better place?

[+] hombre_fatal|5 years ago|reply

Your comments makes it sound like you don't realize Javascript runs on the server where large stacks of transitive runtime deps are actually common, and you're suggesting that because there's another problem (development deps) then the other problem (runtime deps) doesn't exist.

[+] phist_mcgee|5 years ago|reply

Great point.

What are your thoughts on RomeJS and trying to unify the entire toolchain?

https://github.com/romefrontend/rome

[+] fergie|5 years ago|reply

I will die on this hill with you.

I would also add that we should be striving to release uncompiled JavaScript, since it is more auditable, more debuggable, and is naturally distributed. Dev dependencies are often basically compilers (Babel, React, TypeScript, Angular, etc), and JS compilers should be avoided if possible.

[+] SirensOfTitan|5 years ago|reply

We built our app on node and typescript, and I would never choose it again at this point because of the package ecosystem. We do a lot to validate integrity of packages (including checking in vetted archives to our repo), but it’s hard. Our images are ballooned to like 500-600MB (we’ve hit past the GB mark because of certain packages messing up dependencies before) based on a pretty conservative list of dependencies because of node_modules. I’m constantly fighting a battle against image size increases. The sheer amount of files in node_modules ensures that io is always a problem for image size and build speed on CI.

Solutions like yarn berry hardly help: zipfs and patched tsservers is annoying in many editors still. Often packages break because package maintainers include implicit dependencies or the packages their packages depend on do so. Arc has frozen emacs for me several times when jumping to definition in a zip.

I’m just so over the package situation for node.

[+] martpie|5 years ago|reply

500-600mb sounds like you are shipping dev dependencies in your images.

You should use npm/yarn's production flags when installing your dependencies for your images, so you only ship runtime dependencies.

Your images will shrink to 100-150mb.

[+] kitten_mittens_|5 years ago|reply

Yarn 2’s Plug and Play has actually been a dream dep size wise. Once you get over the initial hurdles of setup, you can do things like vend your reps for zero installs. core-js alone (two major versions because @rjsf felt compelled to polyfill in a library) is 3K files worth of node_modules.

[+] notanotherycomb|5 years ago|reply

Are you writing an OS?

I'm pretty sure the linux kernel compiled is < 350mb. Debian net install is less than this

[+] 7373737373|5 years ago|reply

> Some have argued that the ill health of the npm registry is a social, rather than a technical problem

In some cases it is, yes, for packages that require so many access privileges that they can subvert the entire system they run on.

But this is not the case for (I'd estimate) the majority of libraries, because they are purely computational, they only transform data and do not need access rights to any external interfaces (filesystem, network, user input, displays, ...). Malicious data generated by sandboxed programs is still a problem, still the problem would be localized.

There are efforts underway that would allow Javascript programs to effectively and economically sandbox each other and grant only the minimum number of privileges they need to perform their tasks: https://medium.com/agoric/pola-would-have-prevented-the-even...

Avoiding global mutable state and https://en.wikipedia.org/wiki/Ambient_authority, being able to grant rights in an opt-in fashion and to transfer them in a way that is robust in multi-party settings in accordance with https://en.wikipedia.org/wiki/Capability-based_security

This is the https://en.wikipedia.org/wiki/Principle_of_least_privilege and I encourage every language, virtual machine and operating system designer to understand it and implement it in their systems.

Then, the social attack surface can be technically minimized.

[+] inopinatus|5 years ago|reply

So basically, rely on about 5% of JavaScript (my copy of JavaScript: The Good Parts is looking slimmer every day) and hope that everything you’re either directly or transitively exposed to has exactly the same standards you do and will continue to do so in perpetuity, and/or build tons of additional scaffolding to try to sandbox violators, because that has always been such a sure fire path to secure code.

The language, and it’s ecosystem, is a baroque Gormenghast of curiosities built on an ancient sewer where nightmare beasts still roam, and you’ll never stop it stinking just by holing up in the throne room and hoping a few trusted paladins will decontaminate the rest.

[+] RL_Quine|5 years ago|reply

I don't think that the same community producing huge quantities of single use libraries for the sake of padding their resumes will get involved with sandboxing. I recently installed a relatively simply piece of software using NPM and was stunned when it downloaded hundreds dependencies from god knows where, there's simply no ability for anybody to ever evaluate the security risk of NodeJS applications.

https://npm.anvaka.com/#/view/2d/zigbee2mqtt

[+] slaymaker1907|5 years ago|reply

More languages should really be doing this and encouraging it. The JVM can sandbox pretty well using a security manager, but most people don't use the sandbox.

[+] geofft|5 years ago|reply

The principle of least privilege/authority has been around for a while, and the reason we don't see much adoption of it in real-world systems is not because it's unknown.

The first question is overhead: it's true that the majority of libraries are purely computational, but that means that there's frequent interaction between code written by the end developer and code from the library. If every call to, say, lodash's _.filter goes through a process to marshal the programmer's list, send it to a separate execution environment, and then marshal it right back in the other direction to call the predicate, people would choose not to use it. I do agree that the proposal in the post you link to seems to be on the right track - directly run the code in the current execution environment if it can be statically demonstrated that the code has no access to dangerous capabilities.

The second question is making the policy decision about whether to grant privileges. You might be familiar with this from your mobile phone: the security architecture is miles better than that of your desktop OS, but still, most people do say "yes" when asked to let Facebook, Twitter, Slack, etc. access their photos and their camera and their microphone, because they intentionally want those apps to have some access. What do you do in the above model when, say, the "request" library wants access to the network? Now it can exfiltrate all of your data. (The capability-based model is that you pass into the library a capability to access the specific host it should talk to, instead of giving it direct access, but again, if it did this, people would choose not to use it - the whole point of these libraries is to make writing code more convenient.)

The other problem, and perhaps the most important, is that purely-computational libraries can still be dangerous. Yes, _.filter (and perhaps all of lodash) is purely computational, but if you're using it to, say, restrict which user records are visible on a website, and someone malicious takes over lodash, they can edit the filter function to say, "if the username is me, don't filter anything at all." Or if you had a capability-based HTTP client that only talked to a single server, the library could still lie about the results that it got from the server.

I think the way to think about it is that the principle of least privilege is a mitigation strategy, like ASLR or filtering out things that look like SQL statements from web requests. ASLR mitigates not being able to guarantee that your code is memory-safe; if you could, you wouldn't need it. SQL filtering mitigates making mistakes with string interpolation (but it comes with a significant cost, so you really want to avoid it if you can). Least privilege mitigates the reality that you cannot code-review all of your code and its dependencies to ensure that it's free of bugs. But, on the other hand, a mitigation is not a license to stop doing the thing you can't do perfectly - it's just a safety measure. You can still have serious security bugs from buffer overflows even with ASLR; you just have fewer. You should not use ASLR as an excuse to write memory-unsafe code. You can still have SQL injection attacks from people being clever about smuggling strings. You should not use a WAF as an excuse to not use parametrization in SQL queries. And you can still have malicious dependencies cause problems even in a least-privilege situation, because they still have some privilege. You should not use it as a reason to run dependencies you don't trust.

[+] sergeykish|5 years ago|reply

Reminds of Gilad Bracha Newspeak.

[+] stefan_|5 years ago|reply

Trying to solve the halting problem are we. Remember that one of the most dangerous JavaScript APIs turned out to be a sub-millisecond monotonically increasing time source.

[+] arkadiyt|5 years ago|reply

Here's my hot take: supply chain attacks are a low risk for your organization - they are both low likelihood and low impact.

1) Low likelihood: when popular packages get subverted it is caught quickly due to how widely packages are distributed. After it's caught the problem is also heavily publicized for folks to take action, and registries remove the affected versions immediately so there is a very small exposure window.

2) Low impact: people who write malicious code into these packages don't have a specific target, they are writing dragnet malware, which typically means mining cryptocurrency or ransomware. If you're going to get hacked then that's the best possible outcome (as opposed to, e.g. a data breach).

Your security posture would have to be superb if supply chain attacks were anywhere near the top of your list - for the majority of companies they have more basic and targeted issues to worry about.

[+] jakear|5 years ago|reply

Eh... I don’t share your cavalier attitude. You assume these attacks aren’t targeted just because we haven’t seen them, but it wouldn’t be hard at all for an attacker to take control of a package through some means (purchase, social engineering, or just solving a problem more efficiently than others do and aggressively asking others to adopt it), then publishing to npm a minified version of the package which includes some targeted exploit that doesn’t activate except in a specific environment. The source on GitHib would ofc not include the exploit, and there’s no push for reproducible builds in the npm world so verifying that npm’s minified JS was built from the GitHub source is nontrivial and not something most shops would bother with.

[+] captn3m0|5 years ago|reply

On (2) low impact:

A few npm advisories mention packages that were uploading SSH keys and bashrc files.

- https://www.npmjs.com/advisories/541 (package==coffeescript)

- https://www.npmjs.com/advisories/765 (package==portionfatty12)

There's also been packages that would upload the environment variables (increases impact significantly if this reaches production):

- https://blog.npmjs.org/post/163723642530/crossenv-malware-on... (package==crossenv)

- https://www.npmjs.com/advisories/486 (package==sqlserver)

[+] tao_oat|5 years ago|reply

Unfortunately, targeted attacks have been seen in the wild. The `event-stream` attack linked in the post was one example. Alternatively, look at the attack on the Agama cryptocurrency wallet —- the attackers even managed to exfiltrate private wallet keys there: https://komodoplatform.com/update-agama-vulnerability/

[+] spullara|5 years ago|reply

The only attack I have detailed knowledge of was targeted specifically at a company - Copay:

https://thenewstack.io/attackers-up-their-game-with-latest-n...

[+] 7373737373|5 years ago|reply

This is part of the reason why current languages and operating systems simply do not have security properties that would inhibit or entirely prevent these risks: it never mattered economically enough to implement them. Big corporations insure themselves against these risks financially (if at all), not technologically.

The other big reason has been having to maintain backward compatibility, personal computers and programming languages built for them were only networked late compared to some mainframe systems. There have been very interesting historical networked operating systems that were far more secure in their architecture than current contenders: https://github.com/void4/notes/issues/41

[+] dane-pgp|5 years ago|reply

A useful step in parallel to this would be making sure that every NPM package is built from the source code that the metadata claims it is built from:

https://hackernoon.com/what-if-we-could-verify-npm-packages-...

[+] neil176|5 years ago|reply

There's a peculiar dynamic in the npm ecosystem that folks who publish libraries naturally fully embrace the ecosystem, and thereby have a lot of other library dependencies themselves.

I think most engineers would not have _directly_ introduced something like left-pad into their production application dependencies since that's something people would typically implement themselves, but people who publish open source libraries and embrace the ecosystem would gladly use someone else's package for that since they're also publishing with the expectation that someone will do the same with their own work.

It seems wrong to blame open source producers for using the work of other producers and thereby introduce a deep dependency tree, and yet the security concerns are completely valid. I personally don't have any ideas for a solution, but it's worth thinking about.

[+] unknown|5 years ago|reply

[deleted]

[+] greggman3|5 years ago|reply

It's worse than this, not NPM specifically, rather github's atrocious permission system. Tons of github integrations ask for way to big of permissions basically allowing any of those companies, or disgruntled employees, bribed employees, breach data holders, to hack your repos.

https://games.greggman.com/game/github-permission-problem/

This isn't just npm, it's any dev who runs a library hosted from github who signed up to allow random 3rd parties write access to their repos. Could be C++ library, C#, a VSC plugin, a Unity asset. Tons of devs sharing code and giving out write access to that code.

[+] offtop5|5 years ago|reply

The Node standard library doesn't do enough compared to Python. Python in a locked down environment ( you can't just install whatever you want ) isn't bad.

Node is a nightmare without being able to install various packages from npm. Thus someone can remove Left Pad and it's the end of the world. I switched from React Native to Flutter for mobile app development and it was one of the best decisions I've ever made

[+] dpc_pw|5 years ago|reply

https://github.com/crev-dev/crev/

People willing to help out with `npm-crev` implementation needed. :)

[+] trollied|5 years ago|reply

npm needs to sort its quality issue first, but this could also be fixed with there being a better core javascript library.

Take https://www.npmjs.com/package/is-odd for example. This should not be a package. Why it is even allowed to be one is insane. Do the developers importing it don't know how to test for that themselves? Should it be part of core javascript?

Javascript is a mess, and npm is by extension.

[+] sbelskie|5 years ago|reply

Does that package exist because JavaScript lacks a modulus operator (I feel like I remember it having one), or because the operator does/doesn’t coerce things into numbers the way you’d expect? Or is it honestly just laziness?

[+] alquemist|5 years ago|reply

Brainstorm: Could it help to run 'coverage' through the web of dependencies and shake out all the code that is not explicitly exercised by an app test? Is that technically feasible? Is that cost effective?

[+] kchr|5 years ago|reply

I like the idea - it would at least help to visualize how much useless code you pull in and hopefully to make you realize how the few parts actually used could be implemented without those dependencies.

[+] rmrfrmrf|5 years ago|reply

the question of trust is the wrong question to ask imo. the bigger threat is unmaintained packages and maintainers that themselves use risky code management practices (e.g., lack of 2fa on npm). that a dependency is heavily relied on does not mean its maintainer is following best practices security-wise or that they arent just as susceptible to social engineering as anyone else.

[+] captn3m0|5 years ago|reply

The strong-set of such nature doesn't come with much guarantees beyond past-history of the said users. For eg, having commit rights to Debian requires a certain level of security know-how, being an Arch Trusted User has similar requirements (they moved to yubikeys everywhere a while back for eg).

We don't even know if all these users have 2FA enabled for their NPM accounts. Building a software distribution ecosystem that offers trust guarantees post-facto is a really hard challenge, and I think that the right answer is in providing developers better sandboxes. That's not to say this can't be used as a signal as the author suggests, just that the "strong-set-user/package=safe" guarantee doesn't have an underlying basis as of yet.

[+] tao_oat|5 years ago|reply

> just that the "strong-set-user/package=safe" guarantee doesn't have an underlying basis as of yet.

Author here —- I agree. There can be no guarantees about the safety of a package based only on its maintainer(s); their accounts could be taken over, or they could be paid off, and so on. I’m hopeful about initiatives like Deno that provide better security controls built-in to the language.

A significant hurdle to overcome is getting npm (and all open-source) developers to think about trust in the first place. The event-stream incident happened when the previous maintainer handed over control to a random stranger that showed up. We’ve seen similar things happen in other attacks. The thought at this point is that by making trust more explicit, we might start a move in the right direction.

[+] goo6|5 years ago|reply

I will bet a lot of the NPM dependency problems can be solved if Node directly implemented many of the Web APIs. If PHP can implement Dom Parser, there's no reason Node can't implement it as well, for example.

[+] mumblerino|5 years ago|reply

There’s a pretty good reason, actually, and it’s exactly npm.

Because dependencies are managed so easily in Node, it makes no sense for Node core developers to implement more and more of the ever-expanding APIs offered by the browser. They’re better off spending that time tightening the system and perhaps offering low-level interfaces.

[+] ryan29|5 years ago|reply

The description of that dependency used by the BBC makes me wonder why trust is somehow based on popularity. What if the BBC got duped into using a dependency from a bad actor? Is that package trustworthy now?

I wonder if the package repos could come up with some type of standardized, domain verified organization namespaces. I was able to register a decent .com a couple years ago and immediately ran around registering the matching namespace everywhere. That feels a bit dumb when I have a globally unique identifier (the domain) sitting right there.

Why can't I have `example.com` as my organization on NPM? I realize there would be a little complexity in domains changing ownership or being abandoned, but I feel like that's already an issue with first come, first served namespaces. It's just glossed over with the assumption no one will ever give away their account / namespace which isn't true. Is there a way to tell if an organization's owner has changed in NPM?

A domain verified namespace could be on equal footing pretty quickly IMO. If it's limited to organizations, which makes sense to me, have a requirement for the domain owner to declare the official owner of the namespace via DNS or a text file under `/.well-known/`. Ex:

npmjs._dvnamespace.example.com TXT ryan29

Now `ryan29` can claim or take ownership of the `example.com` organization. Every time an artifact is published, that record could be checked to ensure `ryan29` still owns the organization. If it doesn't match, refuse to publish the artifact.

In effect, it's saying "example.com is delegating ultimate trust for this namespace to the user ryan29". If the domain expires, no one can publish to that namespace. If someone new registers the domain and claims the namespace by delegating trust to a new owner, that works as a good indicator that everyone pulling artifacts from the namespace should be notified there was a change in ownership.

It seems like a waste to me when I'm required to register a new identity for every package manager when I already have a globally unique, extremely valuable (to me), highly brandable identity that costs $8 / year to maintain.

Edit:

To add one more thought, I've always been of the opinion that ultimate trust needs to resolve to an individual, not an organization. That probably needs to be done via certificates or key signing and should be done by a local organization.

If I could dictate a system for that, I'd use local businesses to verify ID and sign keys. For example, I'm from Canada and would love to go into Memory Express with my ID and have them sign my GPG key.

I don't think you can get a real WoT like what I think was originally the intent for GPG. There are just too many bad actors these days. I think verifying identity and tying stuff back to a real person is the best you'll get.

An no, I don't want the current code signing style verification. It sucks and the incumbents are nothing more than a bunch of rent seeking value extractors.

[+] adammunch666|5 years ago|reply

[deleted]

[+] adammunch666|5 years ago|reply

[deleted]

[+] cryptica|5 years ago|reply

I don't like where this is going. Especially using number of dependents as a measure of trust. Popularity has nothing to do with trustworthiness (it just makes a problem less likely to occur, but when a problem does occur, it will be a lot worse; and npm has in fact encountered such issues in the past).

Just look at the real world: Is the Federal Reserve Bank a trustworthy institution? Sure, there are a lot of people using its product (the US dollar) so it's extremely popular, but is it trustworthy? Is the product actually what its users think it is?

Power structures are very much the same in open source. The ecosystem has been highly financialized; a library is popular because its author has a lot of rich friends who helped them to promote it on Twitter or elsewhere. So if you don't happen to have rich friends, does that make you untrustworthy?

This would lead to censorship of good projects from trustworthy people who have genuinely good intentions.

I think that such algorithms have done enough damage to society already.

[+] jrochkind1|5 years ago|reply

I mean... I would consider building a business based off the assumption that the Fed will operate how it documents itself to operate and not do things fraudulently or covertly, to be a lot lower risk than, say, building a business based off assuming the same of, say, Tether. Yeah, I'd say the Fed is pretty trustworthy, and the fact that a lot of people depend upon it is a signal of that(not a proof, or a guarantee, but a signal, same as in the library dependency example)

[+] ryan29|5 years ago|reply

I agree that money = popularity = trust is a risky system. Fraud and scams are high margin activities, so bad actors can end up with more money to spend than a lot of legitimate developers.

It's pretty ridiculous that we have real name policies for social networks, but the dev dependencies for a basic web app can have thousands of unnamed contributors. We really need a low friction system where individuals can start signing their code with verified identities.

If I pull in 1k dev dependencies via NPM, I should be able to get a list of the 1k developers that signed off on those packages. If no one is willing to step up and put their name on a package it shouldn't be used by major projects like React, Vue, etc. IMO.

119 comments