The security of package managers is something we're going to have to fix.
Some years ago, in offices, computers were routinely infected or made unusable because the staff were downloading and installing random screen savers from the internet. The IT staff would have to go around and scold people not to do this.
If you've looked at the transitive dependency graphs of modern packages, it's hard to not feel we're doing the same thing.
In the linked piece, Russ Cox notes that the cost of adding a bad dependency is the sum of the cost of each possible bad outcome times its probability. But then he speculates that for personal projects that cost may be near zero. That's unlikely. Unless developers entirely sandbox projects with untrusted dependencies from their personal data, company data, email, credentials, SSH/PGP keys, cryptocurrency wallets, etc., the cost of a bad outcome is still enormous. Even multiplied by a small probability, it has to be considered.
As dependency graphs get deeper, this probability, however small, only increases.
One effect of lower-cost dependencies that Russ Cox did not mention is the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them. This further multiplies the task of reviewing all of the code important to your program.
Linux distributions had to deal with this problem of trust long ago. It's instructive to see how much more careful they were about it. Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust. Of course, the distributions are at the end of the day distributing software written by others, and this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review. And of course, the hassles of getting a library accepted into distributions is one reason for the popularity of tools such as Cargo, NPM, CPAN, etc.
It seems that package managers, like web browsers before them, are going to have to provide some form of sandboxing. The problem is the same. We're downloading heaps of untrusted code from the internet.
It's an interesting line of inquiry to think about how many of these evaluation heuristics, which are all described as things a person can do manually, could instead be built into the package manager itself to do for you automatically.
The package manager could run the package's test suite, for instance, and warn you if the tests don't all pass, or make you jump through extra hoops to install a package that doesn't have any test coverage at all. The package manager could read the source code and tell you how idiomatically it was written. The package manager could try compiling from source with warnings on and let you know if any are thrown, and compare the compiled artifacts with the ones that ship with the package to ensure that they're identical. The package manager could check the project's commit history and warn you if you're installing a package that's no longer actively maintained. The package manager could check whether the package has a history of entries in the National Vulnerability Database. The package manager could learn what licenses you will and won't accept, and automatically filter out packages that don't fit your policies. And so on.
In other words, the problem right now is that package managers are undiscriminating. To them a package is a package is a package; the universe of packages is a flat plane where all packages are treated equally. But in reality all packages aren't equal. Some packages are good and others are bad, and it would be a great help to the user if the package manager could encourage discovery and reuse of the former while discouraging discovery and reuse of the latter. By taking away a little friction in some places and adding some in others, the package manager could make it easy to install good packages and hard to install bad ones.
NPM (nodejs package manager) has started doing that, revealing three combined metrics for each package in the search results. It assesses every package by popularity (number of downloads), quality (a dozen or so small heuristics about how carefully the project has been created) and maintenance (whether the project is actively maintained, keeps its dependancies up to date and if it and has more closed issues than open issues on github). The idea is that when you search for a package, you can see at a glance the relative quality and popularity of the modules you're choosing between.
Its not perfect - there's no way to tell if the packages under consideration are written in a consistent style or if they have thorough unit tests, but its a clever idea. And by rating packages on these metrics they encourage a reasonable set of best practices (write a readme and a changelog, use whitelists / blacklists, close issues on github, etc). The full list of metrics is here:
To some extent, R works like this... packages are only on CRAN if they pass automatic checks and there's a pretty strong culture of testing with testthat. You can have your own package on Github or external repo, but then that's a non-standard, extra step for installing the package.
1) suppose we have pseudonym reputation ("error notice probability"): anyone can create a pseudonym, and start auditing code, and you mark the parts of code that you have inspected. those marks are publicly associated with your pseudonym (after enough operation and eventual finding of bugs by others, the "noticing probability" can be computed+).
2) consider the birthday paradox, i.e. drawing samples from the uniform distribution will result in uncoordinated attention, while with coordinated attention we can spread attention more uniformly...
+ of course theres different kinds of issues, i.e. new features, arguments about wheiter something is an improvement or if it was an oversighted issue etc... but the future patch types don't necessarily correlate to the individuals who inspected it...
ALSO: I still believe formal verification is actually counterintuitively cheaper (money and time) and less effort per achieved certainty. But as long as most people refuse to believe this, I encourage strategies like these...
This paper lends significant legitimacy to a casual observation that I've been concerned about for a long time: as the standard for what deserves to be a module gets ever-lowered, the law of diminishing returns kicks in really hard.
The package managers for Ruby, C#, Perl, Python etc offer ~100k modules. This offers strong evidence that most developer ecosystems produce (and occasionally maintain) a predictable number of useful Things. If npm has 750k+ modules available, that suggests that the standard for what constitutes a valuable quantity of functionality is 7.5X lower in the JS community. Given that every dependency increases your potential for multi-dimensional technical risk, this seems like it should be cause for reflection. It's not an abstract risk, either... as anyone who used left-pad circa 2016 can attest.
When I create a new Rails 5.2 app, the dependency tree is 70 gems and most of them are stable to mature. When I create-react-app and see that there's 1014 items in node_modules, I have no idea what most of them actually do. And let's not forget: that's just the View layer of your fancy JS app.
When I create a new rails 5.2.2 app, I see 79 dependencies in entire tree. Which is about what you said, and a lot less than 1014, sure.
There are various reasons other than "low standards" that the JS ecosystem has developed to encourage even more massive dependency trees.
One is that JS projects like this are delivered to the browser. If I want one function from, say, underscore (remember that?), but depend on all of it... do i end up shipping all of it to the browser to use one function? That would be unfortunate. Newer tools mean not necessarily, but it can be tricky, and some of this culture developed before those tools.
But from this can develop a community culture of why _shouldn't_ I minimize the weight of dependencies? If some people only want one function and others only another, shouldn't they be separate dependencies so they can do that? And not expose themselves to possible bugs or security problems in all that other code they don't want? If dependencies can be dangeorus... isn't it better to have _surgical_ dependencies including only exactly what you need so you can take less of them? (Makes sense at first, but of course when you have 1000 of those "surgical" dependencies, it kind of breaks down).
Another, like someone else said, is that JS in the browser has very little stdlib/built in functions.
Another is tooling. The dependency trees were getting unmanageable in ruby before bundler was created (which inspired advanced dependency management features in most of the rest subsequent). We probably couldn't have as many dependencies as even Rails has without bundler. Your dependency complexity is limited by tooling support; but then when tooling support comes, it gives you a whole new level of dependency management problems that come with the crazy things the tooling let you do.
These things all feed back on each other back and forth.
I'm not saying it isn't giving rise to very real problems. But it's not just an issue of people having low standards or something.
> that suggests that the standard for what constitutes a valuable quantity of functionality is 7.5X lower in the JS community.
Err, does it? The first thing this suggests is that the JS community is 7.5x larger (true for Ruby, even larger factor for Perl and not true for Pyhthon [1]). The second thing this suggests to me is that npm is X times more usable than those languages package managers, which from my experience true for Python (not for Ruby though, dunno about Perl).
>Is the code well-written? Read some of it. Does it look like the authors have been careful, conscientious, and consistent? Does it look like code you’d want to debug? You may need to.
This, 10,000x. I've repeated a similar mantra many, many times, and it's one of the most important reasons I refuse to use proprietary software. You should consider no software a black box, and consider the software you chose to use carefully, because it's your responsibility to keep it in good working order.
Making it someone else's responsibility to keep it in good working order is the value proposition behind (good) proprietary software: You give them money, they give you a support contract.
For a company with more money than development resources, or even just a company whose development resources can be more profitably focused elsewhere, this can be a quite reasonable trade to make.
That said, realize that a lot of what people think is important when it comes to reviewing code is iffy, at best. Consider, git.[1] Odds are extremely high that this does not fit any style of most startups. Now, you could take that as an argument against many startup stylings, but that is not necessarily my intent here.
To their credit, they have a very exhaustive coding guideline that is fairly open to breaking rules when need be.[2]
My personal sense, from watching developments in this space, is that we are going to have to find some way for taking on an open source dependency to be an economic transaction, with money actually changing hands. With open source, the code itself is free (in both the libre and gratis sense), but there are other places to identify value. One of them is chain of custody - is there an actual, somewhat responsible human being behind that package? Many of the most dramatic recent failures are of this nature.
Other value is in the form of security analysis / fuzzing, etc. This is real work and there should be ways to fund it.
I think the nature of business today is at a fork. Much of it seems to be scams, organized around creating the illusion of value and capturing as much of it as possible. The spirit of open source is the opposite, creating huge value and being quite inefficient at capturing it. I can see both strands prevailing. If the former, it could choke off open source innovation, and line the pockets of self-appointed gatekeepers. If the latter, we could end up with a sustainable model. I truly don't know where we'll end up.
On the other hand, it seems like making automatic payments to dependencies would be easy to screw up. Adding money to a system in the wrong way tends to attract scammers and thieves, requiring more security vigilance, while also giving people incentives to take shortcuts to make money. (Consider Internet ads, SEO, and cryptocurrency.)
Monetary incentives can be powerful and dangerous. They raise the stakes. You need to be careful when designing a system that you don't screw them up, and this can be difficult. Sometimes it can be easier to insulate people from bad incentives than to design unambiguously good incentives.
ActiveState has had this business model for quite a while. Even though you can download everything from PyPI, ActiveState has customers who are happy to pay someone else to take responsibility for dependencies.
We desperately need people using packages to pay. Otherwise it's nothing but a bunch of companies issuing demands to often unpaid people who build / maintain our shared code in these packages.
I will personally cop to having received an email complaining about a broken test in code I shared with the world and writing a less than polite email back. The code is freely given; that does not come with any obligations on my behalf.
Fyi, the article's title and sibling top-level comment by austincheney may give the wrong impression of what Russ Cox is talking about.
His essay is not saying software dependencies itself is a problem. Instead, he's saying software dependencies _evaluation_ methodology is the problem. He could have titled it more explicitly as "Our Software Dependency Evaluation Problem".
So, the premise of the essay is already past the point of the reader determining that he will use someone else's software to achieve a goal. At that point, don't pick software packages at random or just include the first thing you see. Instead, the article lists various strategies to carefully evaluate the soundness, longevity, bugginess, etc of the software dependency.
I think it would be more productive to discuss those evaluation strategies.
For example, I'm considering a software dependency on a eventually consistent db such as FoundationDB. I have no interest nor time nor competency to "roll my own" distributed db. Even if I read the academic whitepapers on concurrent dbs to write my own db engine, I'd still miss several edge cases and other tricky aspects that others have solved. The question that remains is if FoundationDB is a "good" or "bad" software dependency.
My evaluation strategies:
1) I've been keeping any eye on the project's "issues" page on Github[0]. I'm trying to get a sense of the bugs and resolutions. Is it a quality and rigorous codebase like SQLite? Or is it a buggy codebase like MongoDB 1.0 back in 2010 that had nightmare stories of data corruption?
2) I keep an eye out for another high-profile company that successfully used FoundationDB besides Apple.
3) and so on....
There was recent blog post where somebody regretted their dependency on RethinkDB[1]. I don't want to repeat a similar mistake with FoundationDB.
What are your software dependency evaluation strategies? Share them.
- How easily and quickly can I tell if I made the wrong choice?
- How easily and quickly can I switch to an alternative solution, if I made the wrong choice?
To contextualize those a bit, its often when trying to pick between some fully managed or even severless cloud services vs something self-managed that ticks more boxes on our requirements/features wish-list.
Also, its pretty important to consider the capabilities and resources of your team...
- Can my team and I become proficient with the service/library/whatever quickly?
Re: other high-profile companies using FoundationDB in production, I suggest checking out these two talks from the project's community conference, FoundationDB Summit: Wavefront/VMware[0], and Snowflake Computing[2].
> Does the code have tests? Can you run them? Do they pass? Tests establish that the code’s basic functionality is correct, and they signal that the developer is serious about keeping it correct.
This is one thing I thoroughly miss from Perl's CPAN: modules there have extensive testing, thanks to the CPAN Testers Network. It's not just a green/red badge but reporting is for the version triplet { module version, perl version, OS version }. I really wish NPM did the same.
That implies too much faith in tests. Tests are no better or worse than any other code. In fact, writing good tests is an art and most people cannot think about every corner case and don’t write tests that cover every code path.
So, unless you audit the tests they add no practical additional layer of trust, IMO, to just using the “package” with or without tests.
I feel like the production environment situation has changed significantly since Perl became popular. Now everything is run in "FROM alpine:latest" or whatever, and if it works once, it will mostly work everywhere. All the bugs that CPAN testers tried to find were platform differences like "debian puts /etc/passwd somewhere other than redhat". Yes, you will absolutely at some point encounter a bug due to some difference between Broadwell and Haswell or ARM and x86_64, so you can't completely ignore the issue. The regex to escape parentheses will probably work on ARM if it worked on x86_64. (I doubt the tests cases are likely to find this bug anyway, though it is sure nice if it does.)
Modest proposal: do the opposite of everything suggested in this article. After all, if you spend all your time inspecting your dependencies, what was the point of even having them in the first place?
This will ensure that maximum time possible is spent implementing new features. Everyone on your team can pitch in to accelerate this goal. Even non-technical outsiders can give valuable feedback. At the same time, this ensures minimum time spent fiddling about in a desperate attempt to secure the system and slowing everyone else down. Besides, unless you're already a fortune 500 company, no one on your team knows how to do security at all. (And even then the number of experts on your team is probably still dangerously close to zero.)
The software you ship will obviously be less secure than if you had focused any time at all on security. However, the utility of your software will skyrocket compared to what it would have been if you had sat around worrying about security. So much that your userbase is essentially forced to use your software because nothing else exists that even has a fraction of its feature set.
Sooner or later the insecurity will catch up with you. But this is the best part-- your software has so many features it is now a dependency of nearly everything else that exists. There is no chess move left except to sit down and somehow actually secure it so that arbitrarily tall stacks of even less secure software can keep being build atop it without collapsing like a house of cards.
And it's at this point that the four or five people in the world who actually understand security step in and sandbox your software. Hey, now it's more secure than a system built by a cult of devs tirelessly inspecting every little dependency before they ship anything. Problem solved.
Worse than package dependency is platform dependency. My code runs on top of 10 million lines of Kubernetes insanity that no one really understands, including the thousands of authors who wrote it. In theory, that means at the drop of a hat I can switch to a different cloud, kubectl apply, and presto! Platform independence. In reality, every cloud is slightly different, and we now depend on and work around a lot of weird quirks of Kubernetes itself. We're stuck with what we've got.
1. Convenience at cost to everything else. Easier is generally preferred over simplicity. If the short term gains destroy long term maintenance/credibility they will solve for that bridge when they come to it at extra expense.
2. Invented Here syndrome. Many JavaScript developers would prefer to never write original code (any original code) except for as a worst case scenario. They would even be willing to bet their jobs on this.
For me (Javascript Developer), you have to stand on the shoulders of giants if you want to compete. Any code you re-invent is code you have to maintain.
I've found though, some engineers love to create everything from scratch, and this greatly hinders their ability to hire/fire as everything is proprietary, and usually not documented.
Most decisions are pretty grey, but for me, choosing to handle stuff yourself is never a good choice. In the same way as no-one should ever try and create Unity from scratch, no-one should try to create React from scratch. You simply can't compete with the support and effort of a global development team.
If you wanna learn though, that's a different kettle of fish. Reinvent the wheel all day. Just don't use it in production.
JS Code runs on so many VMs, which makes trivial things difficult. Imagine you have 10 different JDKs and you want to support all of them.
Of course you can write your own code, but you will be in a bubble then and "works for me" will be something that you say very often.
On the other hand maintaining this code means that you need to have quite solid tests for it, which means that it would be easier to just separated in a module.
Modules are published on npm. Yeah there are lots of them.
It might be that data protection regulations start to 'encourage' movement in this area regards more careful consideration of the software dependency chain. If you pull in a malicious dependency which results in personal information being exfiltrated, I doubt the "we pulled in so many third party dependencies it was infeasible to scrutinise them" defence is going to mitigate the fines by very much.
that is the ideal path, but sadly most things indicate the system prefers the opposite path, especially if we look at "responsible disclosure" where the contributor is expected to give a centralized temporary secrecy agency advance warning, and we blindly have to trust them not to weaponize what essentially amounts to an endless stream of 0days (or trust them not to turn a selective blind eye to malicious exfiltration of these 0days)
I like (and basically agree with) the article, but I have to think it basically does a good job of pointing out the problem, and a bad job of suggesting a solution. The sheer number of dependencies of most commercial software now, and the ratio of backlog-to-developers, basically insures that the work required to check all your dependencies does not normally get done.
Hypothesis: it will require a massive failure, that causes the ordinary citizen (and the ordinary really, really rich citizen) to notice that something is wrong, before it changes much.
Hypothesis 2: after that happens, the first language whose dependency manager handles this problem well, will move up greatly in how widely it's used.
For a 100 man year project we have accumulated around a dozen external dependencies and only two of them are transitive (one for zipping and one for logging).
I think that’s fairly reasonable and about what I’d expect.
So as you might have guessed it’s not a node project, but that’s my point - perhaps the idea of dependencies is manageable so long as the platform allows you to keep it reasonable. Meaning, at the very least, a good standard library.
I think object-capabilities are one way to have much safer code reuse. Suppose a dependency exports a class UsefulService. In current languages, such a class can do anything - access the filesystem, access the network, etc. Suppose however that the language enforces that such actions can only be done given a reference to e.g. NetworkService, RandomService, TimeService, FilesystemService (with more or less granularity). Therefore if UsefulService is declared with `constructor(RandomService, TimeService)`, I can be sure it doesn't access any files, or hijacks any data to the network - nor do any of its transitive dependencies.
The method of sandboxing using OS processes + namespaces and what not is too heavy and unusable at such granularity.
The method of per-dependency static permission manifests in some meta-language is also poor.
The method of a single IO monad is too coarse. Also using any sort of `unsafe` should not be allowed (or be its own super-capability).
Obviously there are many tricky considerations. [For example, it is anti-modular - if suddenly UsefulService does need filesystem access, it's a breaking change, since it now must take a FilesystemService. But that sounds good to me - it's the point after all.] But does any language try to do this?
The problem I see is not in the fact the develpers choose to rely on third party software reuse and thus create dependencies, but in how developers choose which third party software to use. If their judgment fails, the consequences for the user can be dire.
For example, Google chose to reuse the c-ares DNS library for their Chromebooks over other available DNS libraries. It is maintained by the same person who oversees the popular libcurl.
The company issued a challenge and a $100,000 bounty for anyone who could create a persistent exploit with the Chromebook running in guest mode.
As it happened, the winnning exploit relied on an off-by-one mistake in the c-ares library.
Users are not in the position to decide which (free, open-source) code is reused in a mass market corporate product. They must rely on the judgment of the developers working for the corporation.
On my personal computers, where I run a non-corporate OS, I prefer to use code from djbdns rather than c-ares for DNS queries. If someone finds an off-by-one mistake in djbdns, and this has negative consequences for me, it will be my own judgment that is to blame.
The real dependency problem is that most languages give out way too much trust by default. Any code can have any side effects.
I'd like ways to guarantee my dependencies have no side effects, like they were Haskell with no IO/unsafePerformIo, or to aggressively audit and limit those side effects. Malicious event stream package suddenly wants to use the network? No.
Another way to state this is: accept the state of the world and approach the problem using an existing methodology - treat code as untrusted and whitelist execution paths. SElinux and others do this, intrinsic is another product that uses the same approach for app runtime, I think this is probably the future of this problem space.
This is zero trust, and this pattern is showing up everywhere (again?).
There used to be talk about how to increase "reuse" of software, and now that systems use masses of libraries, the down-sides of heavy but casual reuse are coming to light.
I'm not sure of an easy answer. Perhaps the libraries can be reworked to make it easier to only use or extract the specific parts you need, but it's difficult to anticipate future and varied needs well. Trial and error, and blood, sweat, and tears may be the trick; but, nobody wants to pay for such because the benefits are not immediate nor guaranteed.
OOP use to be "sold" as a domain modelling tool. It pretty much failed at that for non-trivial domains (in my opinion at least), but made it easier to glue libraries together, and glue we did.
It's not that hard. You just need to think of dependencies as something that has non-zero benefits and non-zero costs. The problem is that, as usual, whereever you've got a "zero" showing up in your cost/benefits analysis, you're overlooking something. Sometimes it's minor and negligible stuff, but sometimes it's not. Act accordingly.
One thing that I believe we will come to a consensus on is that there is a certain fixed cost of a dependency, analogous to the base cost of a physical store to manage the stock of anything that appears on the shelves no matter how cheap the individual item may be, and that a dependencies will need to overcome that base cost to be worthwhile. I suspect that the requisite functionality is generally going to require in the low hundreds of lines at a minimum to obtain, and that we're going to see a general movement away from these one-line "libraries".
I say generally more than a few hundred lines because there are some exceptional cases, such as encryption algorithms or some very particular data structures like red-black trees, where they may not be a whole lot of lines per se, but they can be very dense, very details-oriented, very particular lines. Most of our code is not like that, though.
15 years ago adding an external module was an endeavor involving approval forms, lawyers, etc. so that it frequently were much easier just to develop required functionality yourself. These days i still shudder seeing how the build goes somewhere, downloads something (usually you notice it only when whatever package manager being used for that part of the build didn't find the proxy or requires very peculiar way of specifying it - of course at the companies with transparent proxies people didn't notice even that ) ... completely opaque in the sense that even if i spend some time today looking into what is downloaded and where from, tomorrow another guy would just add another thing ...
Is the package management story significantly worse for js/node than other languages or is it just a meme? If it actually does have more issues, why? Are the npm maintainers less rigorous than, maven central (for example)?
Java is lucky enough to have a lot of very solid Apache libraries built with enterprise money. Is the culture different for js and npm?
1) Crap standard library and core language, with lots of accidental complexity. These problems are fixed or (more often) swept under the rug many times over by library authors over the course of years, then eventually the core language/lib provides its own attempt at a fix, but by then there are a ton of implementations out in the wild and it'll takes years for a typical mid-size projects dependency tree to shake out all the "deprecated" libraries of any given feature like this, if it ever does. That so many libraries have to pull in other libs for really basic stuff bloats your node_modules dir in a hurry.
2) Platform incompatibility plus no-one actually writes real Javascript that can run directly on any platform anymore anyway, so there are polyfills for yet-to-be-implemented language features and compatibility overlays galore.
3) And yes a lot of it's just the fault of Javascript "culture".
Java/.NET/C++/etc. people don’t have the urge to publish every other line of code they deem “useful”. They also don’t have the urge to import said one-liners when writing a helper method in 15 seconds is perfectly adequate.
> Adapting Leslie Lamport’s observation about distributed systems, a dependency manager can easily create a situation in which the failure of a package you didn’t even know existed can render your own code unusable.
Gold right here. Makes me wonder what Lamport’s TLA+ could be used for in the problem area.
> We do this because it’s easy, because it seems to work, because everyone else is doing it too, and, most importantly, because it seems like a natural continuation of age-old established practice.
And because we literally could not be creating software with the capabilities we are at the costs it is being produced without shared open source dependencies.
I guess this is the same thing as "it's easy", but it's actually quite a different thing when you say it like this.
[+] [-] tc|7 years ago|reply
Some years ago, in offices, computers were routinely infected or made unusable because the staff were downloading and installing random screen savers from the internet. The IT staff would have to go around and scold people not to do this.
If you've looked at the transitive dependency graphs of modern packages, it's hard to not feel we're doing the same thing.
In the linked piece, Russ Cox notes that the cost of adding a bad dependency is the sum of the cost of each possible bad outcome times its probability. But then he speculates that for personal projects that cost may be near zero. That's unlikely. Unless developers entirely sandbox projects with untrusted dependencies from their personal data, company data, email, credentials, SSH/PGP keys, cryptocurrency wallets, etc., the cost of a bad outcome is still enormous. Even multiplied by a small probability, it has to be considered.
As dependency graphs get deeper, this probability, however small, only increases.
One effect of lower-cost dependencies that Russ Cox did not mention is the increasing tendency for a project's transitive dependencies to contain two or more libraries that do the same thing. When dependencies were more expensive and consequently larger, there was more pressure for an ecosystem to settle on one package for a task. Now there might be a dozen popular packages for fancy error handling and your direct and transitive dependencies might have picked any set of them. This further multiplies the task of reviewing all of the code important to your program.
Linux distributions had to deal with this problem of trust long ago. It's instructive to see how much more careful they were about it. Becoming a Debian Developer involves a lengthy process of showing commitment to their values and requires meeting another member in person to show identification to be added to their cryptographic web of trust. Of course, the distributions are at the end of the day distributing software written by others, and this explosion of dependencies makes it increasingly difficult for package maintainers to provide effective review. And of course, the hassles of getting a library accepted into distributions is one reason for the popularity of tools such as Cargo, NPM, CPAN, etc.
It seems that package managers, like web browsers before them, are going to have to provide some form of sandboxing. The problem is the same. We're downloading heaps of untrusted code from the internet.
[+] [-] smacktoward|7 years ago|reply
The package manager could run the package's test suite, for instance, and warn you if the tests don't all pass, or make you jump through extra hoops to install a package that doesn't have any test coverage at all. The package manager could read the source code and tell you how idiomatically it was written. The package manager could try compiling from source with warnings on and let you know if any are thrown, and compare the compiled artifacts with the ones that ship with the package to ensure that they're identical. The package manager could check the project's commit history and warn you if you're installing a package that's no longer actively maintained. The package manager could check whether the package has a history of entries in the National Vulnerability Database. The package manager could learn what licenses you will and won't accept, and automatically filter out packages that don't fit your policies. And so on.
In other words, the problem right now is that package managers are undiscriminating. To them a package is a package is a package; the universe of packages is a flat plane where all packages are treated equally. But in reality all packages aren't equal. Some packages are good and others are bad, and it would be a great help to the user if the package manager could encourage discovery and reuse of the former while discouraging discovery and reuse of the latter. By taking away a little friction in some places and adding some in others, the package manager could make it easy to install good packages and hard to install bad ones.
[+] [-] josephg|7 years ago|reply
Its not perfect - there's no way to tell if the packages under consideration are written in a consistent style or if they have thorough unit tests, but its a clever idea. And by rating packages on these metrics they encourage a reasonable set of best practices (write a readme and a changelog, use whitelists / blacklists, close issues on github, etc). The full list of metrics is here:
https://itnext.io/increasing-an-npm-packages-search-score-fb...
[+] [-] perturbation|7 years ago|reply
[+] [-] DougBTX|7 years ago|reply
[+] [-] zbentley|7 years ago|reply
That's what CPAN does by default. It provides assurance, as well as invaluable real-environment test results back to package maintainers.
[+] [-] DoctorOetker|7 years ago|reply
a vague additional idea:
can we improve rough assessment of code quality?
1) suppose we have pseudonym reputation ("error notice probability"): anyone can create a pseudonym, and start auditing code, and you mark the parts of code that you have inspected. those marks are publicly associated with your pseudonym (after enough operation and eventual finding of bugs by others, the "noticing probability" can be computed+).
2) consider the birthday paradox, i.e. drawing samples from the uniform distribution will result in uncoordinated attention, while with coordinated attention we can spread attention more uniformly...
+ of course theres different kinds of issues, i.e. new features, arguments about wheiter something is an improvement or if it was an oversighted issue etc... but the future patch types don't necessarily correlate to the individuals who inspected it...
ALSO: I still believe formal verification is actually counterintuitively cheaper (money and time) and less effort per achieved certainty. But as long as most people refuse to believe this, I encourage strategies like these...
[+] [-] Sophistifunk|7 years ago|reply
[+] [-] peteforde|7 years ago|reply
The package managers for Ruby, C#, Perl, Python etc offer ~100k modules. This offers strong evidence that most developer ecosystems produce (and occasionally maintain) a predictable number of useful Things. If npm has 750k+ modules available, that suggests that the standard for what constitutes a valuable quantity of functionality is 7.5X lower in the JS community. Given that every dependency increases your potential for multi-dimensional technical risk, this seems like it should be cause for reflection. It's not an abstract risk, either... as anyone who used left-pad circa 2016 can attest.
When I create a new Rails 5.2 app, the dependency tree is 70 gems and most of them are stable to mature. When I create-react-app and see that there's 1014 items in node_modules, I have no idea what most of them actually do. And let's not forget: that's just the View layer of your fancy JS app.
[+] [-] jrochkind1|7 years ago|reply
There are various reasons other than "low standards" that the JS ecosystem has developed to encourage even more massive dependency trees.
One is that JS projects like this are delivered to the browser. If I want one function from, say, underscore (remember that?), but depend on all of it... do i end up shipping all of it to the browser to use one function? That would be unfortunate. Newer tools mean not necessarily, but it can be tricky, and some of this culture developed before those tools.
But from this can develop a community culture of why _shouldn't_ I minimize the weight of dependencies? If some people only want one function and others only another, shouldn't they be separate dependencies so they can do that? And not expose themselves to possible bugs or security problems in all that other code they don't want? If dependencies can be dangeorus... isn't it better to have _surgical_ dependencies including only exactly what you need so you can take less of them? (Makes sense at first, but of course when you have 1000 of those "surgical" dependencies, it kind of breaks down).
Another, like someone else said, is that JS in the browser has very little stdlib/built in functions.
Another is tooling. The dependency trees were getting unmanageable in ruby before bundler was created (which inspired advanced dependency management features in most of the rest subsequent). We probably couldn't have as many dependencies as even Rails has without bundler. Your dependency complexity is limited by tooling support; but then when tooling support comes, it gives you a whole new level of dependency management problems that come with the crazy things the tooling let you do.
These things all feed back on each other back and forth.
I'm not saying it isn't giving rise to very real problems. But it's not just an issue of people having low standards or something.
[+] [-] gr__or|7 years ago|reply
Err, does it? The first thing this suggests is that the JS community is 7.5x larger (true for Ruby, even larger factor for Perl and not true for Pyhthon [1]). The second thing this suggests to me is that npm is X times more usable than those languages package managers, which from my experience true for Python (not for Ruby though, dunno about Perl).
1 - https://insights.stackoverflow.com/survey/2018/#technology
[+] [-] ddevault|7 years ago|reply
This, 10,000x. I've repeated a similar mantra many, many times, and it's one of the most important reasons I refuse to use proprietary software. You should consider no software a black box, and consider the software you chose to use carefully, because it's your responsibility to keep it in good working order.
[+] [-] bunderbunder|7 years ago|reply
For a company with more money than development resources, or even just a company whose development resources can be more profitably focused elsewhere, this can be a quite reasonable trade to make.
[+] [-] taeric|7 years ago|reply
To their credit, they have a very exhaustive coding guideline that is fairly open to breaking rules when need be.[2]
[1] https://github.com/git/git/blob/master/wt-status.c
[2] https://github.com/git/git/blob/master/Documentation/CodingG...
[+] [-] ummonk|7 years ago|reply
[+] [-] raphlinus|7 years ago|reply
Other value is in the form of security analysis / fuzzing, etc. This is real work and there should be ways to fund it.
I think the nature of business today is at a fork. Much of it seems to be scams, organized around creating the illusion of value and capturing as much of it as possible. The spirit of open source is the opposite, creating huge value and being quite inefficient at capturing it. I can see both strands prevailing. If the former, it could choke off open source innovation, and line the pockets of self-appointed gatekeepers. If the latter, we could end up with a sustainable model. I truly don't know where we'll end up.
[+] [-] skybrian|7 years ago|reply
Monetary incentives can be powerful and dangerous. They raise the stakes. You need to be careful when designing a system that you don't screw them up, and this can be difficult. Sometimes it can be easier to insulate people from bad incentives than to design unambiguously good incentives.
[+] [-] sevensor|7 years ago|reply
[+] [-] x0x0|7 years ago|reply
I will personally cop to having received an email complaining about a broken test in code I shared with the world and writing a less than polite email back. The code is freely given; that does not come with any obligations on my behalf.
[+] [-] erlend_sh|7 years ago|reply
[+] [-] jasode|7 years ago|reply
His essay is not saying software dependencies itself is a problem. Instead, he's saying software dependencies _evaluation_ methodology is the problem. He could have titled it more explicitly as "Our Software Dependency Evaluation Problem".
So, the premise of the essay is already past the point of the reader determining that he will use someone else's software to achieve a goal. At that point, don't pick software packages at random or just include the first thing you see. Instead, the article lists various strategies to carefully evaluate the soundness, longevity, bugginess, etc of the software dependency.
I think it would be more productive to discuss those evaluation strategies.
For example, I'm considering a software dependency on a eventually consistent db such as FoundationDB. I have no interest nor time nor competency to "roll my own" distributed db. Even if I read the academic whitepapers on concurrent dbs to write my own db engine, I'd still miss several edge cases and other tricky aspects that others have solved. The question that remains is if FoundationDB is a "good" or "bad" software dependency.
My evaluation strategies:
1) I've been keeping any eye on the project's "issues" page on Github[0]. I'm trying to get a sense of the bugs and resolutions. Is it a quality and rigorous codebase like SQLite? Or is it a buggy codebase like MongoDB 1.0 back in 2010 that had nightmare stories of data corruption?
2) I keep an eye out for another high-profile company that successfully used FoundationDB besides Apple.
3) and so on....
There was recent blog post where somebody regretted their dependency on RethinkDB[1]. I don't want to repeat a similar mistake with FoundationDB.
What are your software dependency evaluation strategies? Share them.
[0] https://github.com/apple/foundationdb/issues
[1] https://mxstbr.com/thoughts/tech-choice-regrets-at-spectrum/
[+] [-] SmirkingRevenge|7 years ago|reply
- How easily and quickly can I tell if I made the wrong choice?
- How easily and quickly can I switch to an alternative solution, if I made the wrong choice?
To contextualize those a bit, its often when trying to pick between some fully managed or even severless cloud services vs something self-managed that ticks more boxes on our requirements/features wish-list.
Also, its pretty important to consider the capabilities and resources of your team...
- Can my team and I become proficient with the service/library/whatever quickly?
[+] [-] teraflop|7 years ago|reply
[+] [-] davelester|7 years ago|reply
[0] https://www.youtube.com/watch?v=M438R4SlTFE
[1] https://www.youtube.com/watch?v=KkeyjFMmIf8
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] athenot|7 years ago|reply
This is one thing I thoroughly miss from Perl's CPAN: modules there have extensive testing, thanks to the CPAN Testers Network. It's not just a green/red badge but reporting is for the version triplet { module version, perl version, OS version }. I really wish NPM did the same.
Here's an example: http://deps.cpantesters.org/?module=DBD::mysql
[+] [-] SomeHacker44|7 years ago|reply
That implies too much faith in tests. Tests are no better or worse than any other code. In fact, writing good tests is an art and most people cannot think about every corner case and don’t write tests that cover every code path.
So, unless you audit the tests they add no practical additional layer of trust, IMO, to just using the “package” with or without tests.
[+] [-] jrockway|7 years ago|reply
[+] [-] jancsika|7 years ago|reply
This will ensure that maximum time possible is spent implementing new features. Everyone on your team can pitch in to accelerate this goal. Even non-technical outsiders can give valuable feedback. At the same time, this ensures minimum time spent fiddling about in a desperate attempt to secure the system and slowing everyone else down. Besides, unless you're already a fortune 500 company, no one on your team knows how to do security at all. (And even then the number of experts on your team is probably still dangerously close to zero.)
The software you ship will obviously be less secure than if you had focused any time at all on security. However, the utility of your software will skyrocket compared to what it would have been if you had sat around worrying about security. So much that your userbase is essentially forced to use your software because nothing else exists that even has a fraction of its feature set.
Sooner or later the insecurity will catch up with you. But this is the best part-- your software has so many features it is now a dependency of nearly everything else that exists. There is no chess move left except to sit down and somehow actually secure it so that arbitrarily tall stacks of even less secure software can keep being build atop it without collapsing like a house of cards.
And it's at this point that the four or five people in the world who actually understand security step in and sandbox your software. Hey, now it's more secure than a system built by a cult of devs tirelessly inspecting every little dependency before they ship anything. Problem solved.
[+] [-] et1337|7 years ago|reply
[+] [-] austincheney|7 years ago|reply
1. Convenience at cost to everything else. Easier is generally preferred over simplicity. If the short term gains destroy long term maintenance/credibility they will solve for that bridge when they come to it at extra expense.
2. Invented Here syndrome. Many JavaScript developers would prefer to never write original code (any original code) except for as a worst case scenario. They would even be willing to bet their jobs on this.
[+] [-] RealDinosaur|7 years ago|reply
I've found though, some engineers love to create everything from scratch, and this greatly hinders their ability to hire/fire as everything is proprietary, and usually not documented.
Most decisions are pretty grey, but for me, choosing to handle stuff yourself is never a good choice. In the same way as no-one should ever try and create Unity from scratch, no-one should try to create React from scratch. You simply can't compete with the support and effort of a global development team.
If you wanna learn though, that's a different kettle of fish. Reinvent the wheel all day. Just don't use it in production.
[+] [-] drinchev|7 years ago|reply
Of course you can write your own code, but you will be in a bubble then and "works for me" will be something that you say very often.
On the other hand maintaining this code means that you need to have quite solid tests for it, which means that it would be easier to just separated in a module.
Modules are published on npm. Yeah there are lots of them.
[+] [-] jsty|7 years ago|reply
[+] [-] DoctorOetker|7 years ago|reply
[+] [-] rossdavidh|7 years ago|reply
Hypothesis: it will require a massive failure, that causes the ordinary citizen (and the ordinary really, really rich citizen) to notice that something is wrong, before it changes much.
Hypothesis 2: after that happens, the first language whose dependency manager handles this problem well, will move up greatly in how widely it's used.
[+] [-] alkonaut|7 years ago|reply
I think that’s fairly reasonable and about what I’d expect.
So as you might have guessed it’s not a node project, but that’s my point - perhaps the idea of dependencies is manageable so long as the platform allows you to keep it reasonable. Meaning, at the very least, a good standard library.
[+] [-] bluetech|7 years ago|reply
The method of sandboxing using OS processes + namespaces and what not is too heavy and unusable at such granularity.
The method of per-dependency static permission manifests in some meta-language is also poor.
The method of a single IO monad is too coarse. Also using any sort of `unsafe` should not be allowed (or be its own super-capability).
Obviously there are many tricky considerations. [For example, it is anti-modular - if suddenly UsefulService does need filesystem access, it's a breaking change, since it now must take a FilesystemService. But that sounds good to me - it's the point after all.] But does any language try to do this?
[+] [-] 3xblah|7 years ago|reply
For example, Google chose to reuse the c-ares DNS library for their Chromebooks over other available DNS libraries. It is maintained by the same person who oversees the popular libcurl.
The company issued a challenge and a $100,000 bounty for anyone who could create a persistent exploit with the Chromebook running in guest mode.
As it happened, the winnning exploit relied on an off-by-one mistake in the c-ares library.
Users are not in the position to decide which (free, open-source) code is reused in a mass market corporate product. They must rely on the judgment of the developers working for the corporation.
On my personal computers, where I run a non-corporate OS, I prefer to use code from djbdns rather than c-ares for DNS queries. If someone finds an off-by-one mistake in djbdns, and this has negative consequences for me, it will be my own judgment that is to blame.
[+] [-] Felz|7 years ago|reply
I'd like ways to guarantee my dependencies have no side effects, like they were Haskell with no IO/unsafePerformIo, or to aggressively audit and limit those side effects. Malicious event stream package suddenly wants to use the network? No.
[+] [-] beardedwizard|7 years ago|reply
This is zero trust, and this pattern is showing up everywhere (again?).
[+] [-] tabtab|7 years ago|reply
I'm not sure of an easy answer. Perhaps the libraries can be reworked to make it easier to only use or extract the specific parts you need, but it's difficult to anticipate future and varied needs well. Trial and error, and blood, sweat, and tears may be the trick; but, nobody wants to pay for such because the benefits are not immediate nor guaranteed.
OOP use to be "sold" as a domain modelling tool. It pretty much failed at that for non-trivial domains (in my opinion at least), but made it easier to glue libraries together, and glue we did.
[+] [-] jerf|7 years ago|reply
One thing that I believe we will come to a consensus on is that there is a certain fixed cost of a dependency, analogous to the base cost of a physical store to manage the stock of anything that appears on the shelves no matter how cheap the individual item may be, and that a dependencies will need to overcome that base cost to be worthwhile. I suspect that the requisite functionality is generally going to require in the low hundreds of lines at a minimum to obtain, and that we're going to see a general movement away from these one-line "libraries".
I say generally more than a few hundred lines because there are some exceptional cases, such as encryption algorithms or some very particular data structures like red-black trees, where they may not be a whole lot of lines per se, but they can be very dense, very details-oriented, very particular lines. Most of our code is not like that, though.
[+] [-] trhway|7 years ago|reply
[+] [-] jayd16|7 years ago|reply
Java is lucky enough to have a lot of very solid Apache libraries built with enterprise money. Is the culture different for js and npm?
[+] [-] asark|7 years ago|reply
2) Platform incompatibility plus no-one actually writes real Javascript that can run directly on any platform anymore anyway, so there are polyfills for yet-to-be-implemented language features and compatibility overlays galore.
3) And yes a lot of it's just the fault of Javascript "culture".
[+] [-] justinsaccount|7 years ago|reply
For example, there is a lodash package, but people don't want to include the whole thing, so now there is one package for each function:
https://www.npmjs.com/~jdalton
As far as I know this aspect is unique to nodejs/npm.
[+] [-] aaaaaaaaaab|7 years ago|reply
[+] [-] baq|7 years ago|reply
Gold right here. Makes me wonder what Lamport’s TLA+ could be used for in the problem area.
[+] [-] jrochkind1|7 years ago|reply
And because we literally could not be creating software with the capabilities we are at the costs it is being produced without shared open source dependencies.
I guess this is the same thing as "it's easy", but it's actually quite a different thing when you say it like this.