Tripping over the potholes in too many libraries

[+] pronik|5 years ago|reply

From personal experience: it doesn't take a FAANG type of scale to reveal those "potholes". For most open-source projects (and especially things like pip, npm and other infrastructure and build tools) a simple, almost classical enterprise outbound proxy with authentication and MitM-style HTTPS-reencryption is more than enough kill almost every assumption they have in their code. In my case, that proxy tended to deliver incomplete files at random, which makes you discover that there is no checksumming happening. Then you build a local package proxy to help with that and then you find out that a particular library doesn't re-evaluate proxy environment variables on redirect, so that a redirect from external site to internal does not work. I can't remember filing that many issues at open-source projects as that time and it's been a draining game of whack-o-mole. At the same time you are reading tutorials called "How I did something in three hours" and you consider whether after-hours drinking might have been a better career choice.

Sorry. Enough ranting. That problem is everywhere, not just in libraries. I'm just not sure the problems lies in the users of those libraries, it's not too easy to use those libraries, it's too easy to publish them with a generic sounding name, but solving only a particular use case.

[+] im3w1l|5 years ago|reply

An unsatisfying solution is to surrender and try to be as vanilla as possible in everything. With a few carefully thought out exceptions as needed by your business.

[+] jankotek|5 years ago|reply

Could you explain a bit more? I think most pkg repos are just static directories. If you do https reencryption, and https certs do not match EVERYTHING will break.

[+] vbsteven|5 years ago|reply

I have been thinking a lot lately about a possible solution for a small portion of this problem: Microdependencies. I'll explain in more detail in the context of JS, but it applies to other languages as well.

Currently package repositories like NPM host 2 types of dependencies: big community packages (frameworks, database drivers, validation libraries, query builders,...) and smaller function-scoped utility packages (left-pad, is-even, math functions,...)

My proposal is to create a new package manager for the latter (or adapt existing ones) which handles micro-packages, typically scoped to 1 function, 1 file or a small set of files. The big difference with regular NPM is that these micropackages do not get downloaded into node_modules, but are downloaded inside the src directory and committed to source control. This means that when you add a micropackage to your project, its source code is pulled into the regular src folder and is subject to the same code review as the other code your team writes. When the micropackage is updated the changes are visible during code review as well (instead of just being an opaque version bump in package.json)

This process makes small dependencies more visible and has review builtin. It also solves the problem of having to rewrite the same logic in every project in a slightly different way.

Edit: I'm not currently working on this due to time constraints but if anyone wants to talk about this, hit me up.

[+] marcus_holmes|5 years ago|reply

I think the answer is to not import the dependency at all.

If you absolutely can't write a single function, then copy/pasting it from a set of "known good" functions would be better than importing a dependency.

Of course, just writing the function would be better. It would probably take less time than discovering the package, working out the interface for it, discovering that it doesn't actually work for your use case, discovering another, better package, etc.

[+] gioele|5 years ago|reply

CCAN (C Code Archive Network) http://ccodearchive.net/ follows a similar philosophy. CCAN contains various small, self contained snippet of C code are meant to be be downloaded in a local directory ("vendored") and used directly or with small modifications.

The most straightforward way to use CCAN is to

1) `git clone` it in a subdir;

2) create a `config.h`;

3) include whatever `lib.h` you want in your project.

Updating these micro dependencies means doing `git pull`.

All libraries come with tests and a somewhat standardized API design.

What is missing (compared to your aims) is a declarative way to lock a dependency to a specific version.

[+] pjc50|5 years ago|reply

> possible solution for a small portion of this problem

In CS research, this is known as "software components", and people have been worrying about it since at least the 1970s with very slow progress. Here's Stoustroup writing about it in the 90s, for example: https://www.semanticscholar.org/paper/Language-technical-asp...

"Reuse happens only when a variety of social conditions are favorable. However, social conditions, development processes, and design methods alone cannot guarantee success. In the end, working code must be produced"

The dream used to be that software could be like electronics: standard elements like resistors would have a small number of well-characterised parameters, and you can just pick them from a catalog. Large components would come from vendors with large datasheets that would also describe their performance, and would be guaranteed a certain period of availability such as ten years.

The end result is not quite like that. A lot of components have made it into standard libraries (what python calls "batteries included"). Everything else is available on repositories. However, because everything has to be free (and sometimes Free), that also selects for the absolute minimum maintenance effort and quality. Freedom has many huge advantages, as does not having to get purchase order or "BOM" approval every time you add a dependency to your project, but it does leave people in the "if it breaks you get to keep both pieces" situation.

[+] mleonhard|5 years ago|reply

I really like this idea.

When reviewing changes of the imported code, one will need the original change sets with comments. Therefore the system will need a format for representing these diffs. I don't think git is the right answer.

After review, the tool could upload a "passed review" signature to the central repository. Folks can see which version has been reviewed and approved by which organizations. This would let small organizations benefit from the review work done by large organizations. For example, a startup founder can feel more confident using a library version that has passed review by several of the FAANG companies.

[+] dna_polymerase|5 years ago|reply

Horrible idea, if a package get corrupted by an adversary not only would you have to clean the npm package but also hope everyone who committed the code into a repository gets the news and fixes it. At least with NPM there is the chance that a fixed version gets automatically pulled on the next build.

In addition to that, most package managers rely on metadata, like description files, the bloat introduced by these micro-packages is unfathomable.

Then there is the obvious answer to all this, use a proper language, learn it and don't pull in stuff like "left-pad" from external sources. It's a one liner. If you can't come up with that yourself get the fuck out of this industry. Seriously.

[+] manicdee|5 years ago|reply

What about git submodules?

[+] habitue|5 years ago|reply

This comes up all the time and I never understand this attitude.

Yes: dependencies are bad. Not having dependencies and writing everything yourself: also bad.

Honestly, you have to rely on heuristics in your deps. How active is the project? How simple is the thing it's doing (simple enough to probably not have major bugs, but not so simple it's faster to code it yourself) etc.

You get so much velocity from depending on external libs, it's straightforwardly bad not to take advantage of it. Are the junior devs at your company really going to write a better config handling library than the random one from github? Probably not. Maybe they'll both be bad code but the open source one has a better shot of someone else noticing it.

[+] marcus_holmes|5 years ago|reply

But once they've done it your junior devs will know how to write a config handling library. And they'll have hit a few bugs so they'll know about file locking, and concurrency problems, and overwriting, etc. It will be a great learning project.

If you teach them to just import a dependency, then that's all they know. They'll never be able to write a concurrent library, they'll just know how to import one.

Having lots of velocity is bad if there are too many potholes in the road. Slow is smooth, smooth is quick.

[+] Ar-Curunir|5 years ago|reply

And by re-implementing a library, you're just importing dependencies into your code, with the difference that now you're responsible for bug fixes and security patches, without the opportunity to inherit performance improvements contributed by others.

[+] l0b0|5 years ago|reply

There are so many problems working in tandem here:

- Leaky abstractions which are treated/documented as if they're perfect. Read a file, do a thing, write a file. Except when running in parallel, things are never that simple.

- Lots of important projects ("base of Arch Linux" level important) still don't have public repos, an official site (or even page) with references, bug tracker, or any reasonable way of contacting the maintainers. How do these projects get updated? I guess they simply don't.

- Even the best feedback mechanisms are still awful. Figuring out whether something is effectively abandoned before spending 30+ minutes writing up a bug report could take longer than writing the bug report itself, encouraging writing the absolute minimum (which ends up being less than minimum half the time). Every project is also its own unique snowflake. They want different types and amounts of information, some won't touch an issue unless it's reproducible in the very latest commit (requiring hours of setup and lots of domain knowledge to compile the damn thing), and they all have a completely different process.

- Most tools are still skewed towards one maintainer per project. Take any random GitHub abandonware as an example. You can fork it with one click, but so have 20 other people. You can put up a web site for your fork, but search engines will rank the original web site (or just the GitHub repo) highest for years. You can rename it, but now you've just burned bridges with all the other forks and the original repo, in ways which would be difficult to repair. Now try to migrate all the issues, mailing list, forum, IRC channel or other communications. It's basically hopeless unless the original maintainer is completely on board.

- Most open source software solves only the author's immediate problem. But not a single user of the software has exactly the same problem to solve as the original author. So pretty much by definition no library will solve the problem you are dealing with.

Maybe one way out of this mess for any particular language might be if the standard library was meant to grow to take over popular code, possibly with a tiered approach. You could nominate a library, then it would go through reviews, extensive testing, the whole shebang, before being accepted into a low level "this might someday be part of the standard library in some form" tier. This ideally would make it possible for example to replace core libraries in steps, steadily pushing the old one out of core and pulling a better alternative in.

[+] red_admiral|5 years ago|reply

I guess the conundrum is: for 80% of people, maybe an 80% library is good enough; and 80% of library vendors are happy to write libraries that are 80% good enough - and 80% of customers have learnt to put up with stuff that occasionally doesn't work?

To me, the best solution is a language ecosystem that distinguishes itself by having properly written standard libraries to start with - whether built in, dynamically linked or optionally included in your project one way or another. For example, key=value config file parsing and saving should absolutely not need a third-party library. That should be as much a selling point of your language ecosystem than having a package manager that can pull directly from github.

One of the few upsides to "enterprise Java" is that there's a lot of "enterprise level" libraries around for it; even a lot of the open source ones are maintained by organisations like the Apache foundation rather than some random person on the internet. At one place I used to work, we had a whitelist of approved libraries and vendors, and even if you could in theory just import a library from the web, you had to make a case to your manager and get someone to sign it off if you wanted to use something not on the list. (As a side-effect, they'd also check the licence conditions, as we were writing closed-source commercial stuff. As a young intern, I got my first lesson in what (L)GPL meant when I requested to include something in one of our projects.)

[+] hedora|5 years ago|reply

I think a lot of comments here are missing the fact that one could reimplement the config file handling library and atomic file update library in less time than it took to track down the package responsible for the bug and root cause it.

I’ve found such issues are the common case for libraries written in languages that encourage tons of tiny dependencies.

[+] stefan_|5 years ago|reply

This is so bizarre, as if fixing a config write to be atomic would somehow fix the root issue of this program was not designed with a config store in mind that can tolerate multiple instances of itself. Because of course it doesn't. Because having multiple instances in parallel was a novel requirement you introduced that the original author obviously didn't care for.

This isn't a learning experience about flock at all.

[+] kelnos|5 years ago|reply

> My guess was that people get into these situations where it seems like a library is going to be a solid "100% solution", and yet it lets you down and maybe reaches the 80% mark.

That's probably true. I think a difficulty in avoiding this is that often the person reaching for this library, if they tried to write it themselves, would only hit the 60% mark.

And I don't think there's anything wrong with that. There are people out there writing code at many, many different skill levels. Libraries -- even those that are less than perfect -- let people get things done that they might not be able to do on their own, or would do a worse job of without the library.

[+] marcus_holmes|5 years ago|reply

I'm always amazed by this attitude. That somehow the authors of random libraries on npm are amazing coders whose code quality can't be touched by mere mortals like us.

Firstly, if you've always imported a dependency to get around a problem, then yes it's going to be hard to solve that problem yourself. But it's also a learning experience. Keep doing it and eventually it won't be hard.

Secondly, the library probably isn't a perfect match for your use case, and probably contains a lot of flexibility to match a wide range of use cases. Maybe as much as 80% of the code isn't actually any use to your project. It'll be more complex code, because it's dealing with a wider set of use cases. The thing you'd write to solve your particular use case would be smaller and cleaner almost by definition. You'd end up with less code, that you understand well. This is always a better position to be in.

[+] notacoward|5 years ago|reply

Reusing code - your own or someone else's - often turns out to be a mistake. Reimplementing functionality also often turns out to be a mistake. Either way people get fooled by 80% solutions. There's no simple answer to this. I can give a simple algorithm - look at how much time you'll spend on each vs. where it's most important for you to spend your time - but you still have to apply that algorithm to your own data set to know which choice is right. And sometimes you'll get a wrong answer, and it'll be annoying (as appears happened in the OP's example). That's why they pay us the big bucks.

[+] sneak|5 years ago|reply

I think the generally accepted fix here (despite Rachel’s aversion) is to submit a PR to the file writing library that fixes the corruption issue (likely using atomic rename), then get the tool to bump the version of their dep or vendor in the fixed version.

I’ll admit, though, that the balkanization of code adds overhead from the abstraction. I just don’t think it’s a bad thing, because it’s all very new and things are still shaking out.

Imagine if the fix lands in the config file writer library and all the downstreams regularly upgrade their deps; the fix is now a lot more widespread. This is better than every single end dev knowing about atomic renames, I think.

[+] rini17|5 years ago|reply

It is likely that fix would break somebody else's code which unwittingly depends on the bug. Then, burden of educating the users would fall on maintainers. Who, most likely, aren't having any of that. (Author even linked a article about this.)

[+] peter_d_sherman|5 years ago|reply

>"In short, I think it's become entirely too easy for people using certain programming languages to use libraries from the wide world of clowns that is the Internet. Their ecosystems make it very very easy to become reliant on this stuff. Trouble is, those libraries are frequently shit. If something about it is broken, you might not be able to code around it, and may have to actually deal with them to get it fixed.

Repeat 100 times, and now you have a real problem brewing."

A future job interview question that might be asked at my future company:

"Let's say you're using a library for specific functionality, a library that you haven't written. Now let's say that there's a bug in that library that you can't work around.

How would you debug that library?"

See, there's a differentiator there.

There are programmers who can debug libraries, and then there are programmers that can merely use libraries provided to them.

Given a choice, you want a programmer who can debug libraries working for your company...

[+] fnord77|5 years ago|reply

As people here point out, using 3rd party dependencies is inevitable, unless you're a tech colossus and can afford to write everything from scratch.

For the most part, these 3rd party dependencies work fine. In my experience it is fairly rare to encounter a show-stopping bug.

I have a simple mitigation for the "potholes" though - try to take the time skim through the code of the library ahead of time and try to figure out what it is doing. That way if you do hit a problem, you can fork and fix. Doing this can also give you a sense of how well-written a library is to begin with.

[+] hpcjoe|5 years ago|reply

I've run into an application, used for monitoring, that had that exact type of bug, albeit not with a dot file.

A customer of my old business had built a little monitoring system for their compute nodes mounting a parallel file system. Their integrated test had every compute node open a particular fixed path and file name (of course the same on every node across the system) for read and write.

This "monitoring" script meant that they could have up to 2000 or so simultaneous IOs going to the same file, with no read/write locking. The tool read/wrote some number of bytes to get a performance read.

The end result was 1) lots of contention of course at the metadata layer, 2) often times spurious and incorrect reports of the parallel file system being offline (it wasn't).

We tried helping them on this, but they insisted they were doing this correctly (they weren't).

This is less about libraries with potholes per se, and more about critical applications (to a degree similar to libraries providing critical functions that need to be correct in semantics, implementation, and in error generation) that are broken due to a misdesign/mis-implementation somewhere.

With regard to her commentary on CPAN, one of the more annoying things I've dealt with in many libraries is their choices of error return. Some will throw exceptions. Some will return real errors you can process in-line. I am not a fan of throwing exceptions, and when I build larger Perl apps, I tend to insert some boilerplate exception handlers specifically due to the burns I've encountered in the past when modules do dumb things.

[+] smitty1e|5 years ago|reply

Rachel seems to trip over the concept of Division of Labor[1] to a point. There is arguably more net gain from not having everyone reinvet OpenSSL just to communicate.

Or maybe she's more against being indiscriminate about re-use.

But there just aren't many companies big enough to engineer their own complete solutions.

[1] https://en.m.wikipedia.org/wiki/Division_of_labour

[+] ohazi|5 years ago|reply

> I will never know why the team chose to handle my report by swallowing the error instead of dealing with upstream, but that's what happened.

The team avoided dealing with upstream in exactly the same way that you avoided dealing with upstream. What's so difficult to understand?

[+] _y5hn|5 years ago|reply

Honestly, you have this problem because you put up with it, or are "forced" to use certain software. Why is it everytime I venture into Python, interesting language aside, there are always multitudes of dependency problems even tied to the OS? People put up with it. Choosing quality over fast-food is a choice. Languages such as golang has largely solved much of this while still being pretty portable, and devs are encouraged to minimize deps.

Of course fast-food is tempting and may taste good, but after a while of abstention, this fades away. But it's a choice, and one might miss some opportunities.

[+] user5994461|5 years ago|reply

Python is really easy to use. I don't get what's the troubles?

    git clone https://example.com/myproject
    cd myproject
    /usr/bin/python3 -m venv env
    source ./env/bin/activate
    pip install requirements.txt
    python3 src/myapp.py

[+] yellowapple|5 years ago|reply

A bit beside the broader point of the article, but...

> They responded. What did they do? They made it catch the situation where the dotfile read failed, and made it not blow up the whole program. Instead, they just carried on as if it wasn't there.

The tone here (and in subsequent paragraphs) seems to suggest that this is somehow the "wrong" answer.

Yet, it's exactly what I would've done, since not doing this is clearly wrong. The file's corrupted with no hope of recovery, is not apparently meant to be user-serviceable (so user configuration is highly unlikely to be at risk), and is evidently not essential for the proper operation of the program (nor does it, in general, hold anything especially important - and no, some counter for when to nag about updates ain't important in this situation). There is absolutely no reason why this file's corruption should prevent normal operation; therefore, gracefully catching this error and expanding the "file doesn't exist" case to "file doesn't exist or is otherwise corrupt/unusable" and proceeding normally is absolutely the right call.

In an ideal world, they'd fix the actual corruption, too, but preventing that corruption from being an issue is the first and most critical step. I'd hardly call making software less fragile (said fragility being exactly why so many "80%" libraries are indeed 80%) a "missed opportunity"; indeed, not doing this seems like a glaring missed opportunity to make the software more robust against issues far beyond those caused by the limitations of some library.

That is:

> I will never know why the team chose to handle my report by swallowing the error instead of dealing with upstream

Because not swallowing the error would be a patently broken design, regardless of whether or not there was some library involved.

----

EDIT:

> Yeah, that's right, because I worked for G or FB or whatever, somehow any time I have a problem with something, it's because I'm trying to do something at too big of a scale? Are you shitting me? COME ON.

I mean, yes? The situation described is very obviously one where the author is trying to use a tool clearly not designed for massively-parallel execution for a task involving massively-parallel execution. Why said author is somehow surprised that said tool breaks when put under that sort of stress is beyond me.

It's akin to buying a 1 ton jack from the auto parts store and then complaining to the minimum-wage employees thereof because when you tried to use it to change the treads on a 50-ton bulldozer the jack predictably flattened like a pancake.

There's always a point where off-the-shelf parts won't cut it and you gotta roll your own solution in-house. Evidently the author has a tendency to hit that point sooner rather than later. That's pretty rare, though, and condemning the very notion of an external library because of said libraries failing under very exceptional circumstances seems absurd.

That is: no shit most libraries are written for the 80%, because the 80% ain't got the resources or motivation to deal with the consequences of NIH syndrome like the 20% might. They gotta ship something, and can't afford to let perfect be the enemy of good.

The more reasonable approach (for those currently in the 80% but hoping to eventually be in the 20%) is to start with those off-the-shelf libraries, and then be prepared to fork 'em and use them as starting points for specialized approaches (or else swap them out with specialized replacements). In this regard the author's correct in that the package ecosystems of most languages are poorly suited to this, since they offer little to no mechanism for customizing dependencies without rather painful workarounds.

[+] lowiqengineer|5 years ago|reply

> I show up with a problem ("hey, this thing keeps getting corrupted because X and Y") and suddenly it's because I'm "from" G or FB or something and I "want unreasonable things" from their stuff. So, my request is invalid, thank you drive though.

No, the answer is they're trying to deliver impact for the customer and Rachel is instead asking them to invest time in a solution that doesn't bring them any closer to that. I'd imagine most of them would be fine with fixing the root cause, but for some fucking reason everyone from Google or FB feels the need to reinvent every wheel in order to peacock and show that their IQ is the highest.

With that said, at my "normal people" non-bourgeoisie company the 3P libraries are all converted to the internal build system. If there was a fuckup of this magnitude, someone would just create a branch and bump the version number with the fix. Problem solved.

[+] sulam|5 years ago|reply

This is a pretty bad take. Rachel has said implicitly that there was no version with a fix available and explicitly that working with the maintainers to get a fix is often unreasonably difficult. This matches my own experience fairly well. So your ‘normal people’ solution won’t work.

You may also want to consider that there are good reasons for engineers and engineering management to be default suspicious of 3rdparty dependencies at large companies such as those you’ve listed. These reasons have nothing to do with peacocking or demonstrating high IQ (replicating things available elsewhere is not a way to demonstrate your intelligence, it turns out). They have much more to do with the high bar for security consciousness, unique need to deal with scale or low performance tolerance, or extreme organizational risk associated with being a company with a target on your back for every major hacker, researcher, regulator, journalist, or self-proclaimed watchdog — not to mention operating under several consent decrees already.

56 comments