top | item 44600594

NIH is cheaper than the wrong dependency

310 points| todsacerdoti | 8 months ago |lewiscampbell.tech | reply

198 comments

order
[+] solatic|8 months ago|reply
Author points to TigerBeetle as an example, which I wasn't familiar with, so I went down a rabbit hole.

Wow. Yeah. If you're writing a financial ledger transactions database from scratch in Zig, where safety and performance are your top two goals, so that you can credibly handle a million transactions per second on a single CPU core, well, damn right you can't afford to take any risks on introducing any dependencies. Makes absolute perfect sense to me (no sarcasm).

But the average developer, writing most line-of-business CRUD systems, well, quite honestly they're not that strong to begin with. Most dependencies may be bug-infested garbage, but they're still higher quality compared to what most developers are capable of. Most companies are bottlenecked by the caliber of talent they can realistically recruit and retain; and internal development standards reflect the average caliber on the payroll.

So like most advice you read on the internet, remember, everybody is in different circumstances; and polar opposite pieces of advice can both be correct in different circumstances.

[+] matklad|8 months ago|reply
(I work on TigerBeetle)

+100, context is the key thing, both TigerBeetle and rust-analyzer have strong culture of how the things are done, but the _specific_ culture is quite different, because they solve very different problems.

That being said, I _think_ you might be pattern-matching a bit against the usual dependencies good/dependencies bad argument, and the TFA is on a different level. Note the examples of dependencies used: POSIX, ECMA-48, the web platform. These are not libraries, these are systems interfaces!

Dependency on a library is not a big deal --- if it starts to bite, you can just rewrite the code! But dependency on that underlying thing that the library does is usually something that can't be changed, or extremely costly to change. Answering "No" to "is doing X in scope for my software?" is powerful

To refer to the sibling comment, if there's a team whose core business is writing matrix multiplication code, they _can_ use random library #24 for something else, but often, if you apply some software design, you might realize that the product surrounding matrix multiplication _shouldn't_ handle concern #24. It doesn't mean that concern #24 isn't handled by anything at all, rather that we try to find _better_ factoring of the overall system to compartmentalize essential dependencies.

[+] whstl|8 months ago|reply
The problem I have with this line of thinking is that it only works when you're talking about bottom of the barrel developers.

The tech world has this tendency to dumb down every advice so it caters to the lowest common denominator, but I honestly have never worked in such an environment where developers couldn't learn to duplicate the functionality of shitty dependencies while at the same time fixing the problem.

And as much as people love to mock CRUD, bad abstractions can be a horrible drain on teams doing CRUD work. And even the popular stuff can often suck and be a timesink unless you're doing absolutely basic stuff that's just repeating the tutorial.

[+] vasco|8 months ago|reply
It doesn't even have to be related to quality of developers. Whatever tool chain or product you use, you're using someone else's dependencies. In some places more than others, but most people aren't implementing their own matrix multiplication code, and the ones that are, aren't implementing random library #24 which isn't in their core business. So this whole discussion happens on black and white terms when in fact people only care if they have regulatory obligations, or they have a particular personal interest in implementing something, or they got attached to that idea for a specific library. But nobody is applying this fully or they'd still be at the beach collecting sand.
[+] s1mplicissimus|8 months ago|reply
> But the average developer, writing most line-of-business CRUD systems, well, quite honestly they're not that strong to begin with.

This is a wild take. Developers don't usually get to pick the system they work on, resource constraints exist during development time. We make tradeoff decisions every day and pulling in a "cheap" dependency will often make the difference between a shipped, working piece of software and plain "nothing". You seem to be fairly little involved in actual software development, considering how generic and wrong this take is.

[+] panstromek|8 months ago|reply
I also don't quite like TigerBeetle as an example, because it is very new startup. They are pre-1.0 and just a bit more than a year since production release. It's way too early to tell whether their approach actually paid off.
[+] LAC-Tech|8 months ago|reply
Author here!

I pointed to tigerbeetle because they're one of the few companies I know of who 1) have a philosophy about dependencies and 2) talk about it openly in an easy to quote document.

I'm not sure if I succeeded, but I wanted to get across that I mean "dependency" in a very broad sense. Not just libraries, or services you interact with, but also what you need to build and deploy.

It wasn't a call to just not use dependencies. It was a call to think about them and be clear what you're depending on. Tigerbeetle have done that and given reasoning.

[+] bob1029|8 months ago|reply
NIH is amazing as long as you are realistic about what you are taking ownership of.

For example, the cost of maintaining a bespoke web frontend "framework" that is specific to your problem domain is probably justifiable in many cases.

The same cannot be said for databases, game engines, web servers, cryptographic primitives, etc. If you have a problem so hard that no existing RDBMS or engine can support, you should seriously question the practicality of solving your problem at all. There's probably some theoretical constraint or view of the problem space you haven't discovered yet. Reorganizing the problem is a lot cheaper than reinventing the entire SQLite test suite from zero.

[+] account42|8 months ago|reply
> The same cannot be said for databases, game engines, web servers, cryptographic primitives, etc.

It absolutely can in many cases.

- Many applications don't need much more of a database than files plus perhaps a runtime index. Spawning a database server or even embedding SQLite can be overkill in many cases.

- Most games would be better off with a bespoke game engine IMO, especially those made by small teams. The only real advantage to using an established engine is the familiar asset pipeline for replaceable artists but you pay for that with a huge overhead. Remember that custom engines or at least heavily modified ones maintained by the game developer used to be the norm - "just use Unity^WUnreal" is a relatively recent trend.

- There is no real difference in complexity between a simple web server and FastCGI application.

- Not all uses of cryptographic primitives are a security issues. If you calculate a hash to check against corruption you can implement that yourself (or copy an existing implementation) instead of pulling in a huge library. If your use cases is decrypting existing data offline then again you don't really care about cryptographic security only about getting the correctly decrypted output.

And there are many more cases like these. Learned helplessness when it comes to "hard" subjects helps no one.

Just because some existing solution "solves" your problem doesn't mean its the best or most efficient way to do it.

[+] DrScientist|8 months ago|reply
Why are there so many different database engines then? Is each instance of a different database out their an instance of NIH?

The answer is of course that every computer system of any complexity has to make trade-offs - do you need constraints, do you need scalability, do you need concurrency, security? Is your data of a specific form that benefits from a particular compressible storage form, is it optimised for write once read many or read/write etc etc..

In terns of the thrust of the main article - which is about the cost of dependencies - I find that I'm typically much happier to take a dependency if that dependency is very keen on minimising it's own dependencies! - ie I don't want to take on a whole forest of stuff.

In my view many automatic dependency management systems have helped create the mess that they claim to try and solve, and using careful manual dependency management can help cut down the burden.

[+] f4c39012|8 months ago|reply
i can think of two reasons for using a third-party dependency

1) a dependency on a third-party service provider that publishes the dependency. So long as that service provider is current, the dependency should be maintained 2) short-cut to code i don't want to write

I have no arguments with (1), there's a business reason and the lifecycles should match. However, I should still expect major version breaking changes in order to keep my application running. For (2), the wins are less clear, more dependenent on the perceived complexity of what I can avoid writing.

Taking on any kind of dependency means that someone else can dictate when I need to spend time updating and testing changes that don't add anything to my product. Taking on a third-party dependency is always taking on a responsibility to maintain a codebase or the risk of not doing so.

[+] marginalia_nu|8 months ago|reply
You fairly quickly run into problems a RDBMS can't solve. They're above all incredibly hamstringed by being built around supporting concurrent mutation of the data set and mutable datasets in general.

If you don't need that, even a fairly naive index implementation will get orders of magnitude more performance out of a computer than you will with a RDBMS if you can assume the data is immutable.

[+] thesz|8 months ago|reply
That RDBMS example of yours is fascinating.

There are plenty of RDBMS here (wikipedia lists some 100+ of them), there are plenty of problems most of them can not solve, but some of them do solve.

These people considered practicality of their solution and went forward doing the implementation.

[+] a3w|8 months ago|reply
What is NIH? Skimmed the article. Still don't understand, after doing so twice. Google says national institute of health.
[+] michaelcampbell|8 months ago|reply
> as long as you are realistic about what you are taking ownership of

Including training and onboarding. There's a reason "popular" libraries, languages, patterns, etc. are popular. One "win" is the network effect when looking for talent - they can hit the ground at least jogging.

[+] abbotcabbot|8 months ago|reply
> Reorganizing the problem is a lot cheaper than reinventing the entire SQLite test suite from zero.

Sure, but if you aren't happy with existing DBs you are probably wrong in thinking you need a general DB instead of a file layout on a filesystem.

[+] fmajid|8 months ago|reply
Dependencies introduce risks, but not using them at all puts you at a competitive disadvantage against those who are using them and thus achieve faster development time and time to market.

What you need is a process to manage dependencies:

1) Only consider open-source dependencies.

2) Introducing new dependencies requires a review. Not just a code review on the pull request introducing it, but checking the license, estimating how much work it would be to rip out, auditing it for history of security vulnerabilities or bugs, whether it is live and still gets updates, how vibrant the community is, and so on.

3) If possible, review your dependencies for possible security issues. Doing this at scale is expensive and the economics of this are still an unsolved problem (I have my ideas: https://blog.majid.info/supply-chain-vetting/).

4) Do not adopt a dependency you are not willing and able to take over the maintenance of, or fork if necessary. At a minimum, it means you have built it from source at least once, not just used binary packages maintained by someone else.

5) Preemptively fork all your dependencies. People can and do delete their repos out of pique, see left-pad.

[+] chii|8 months ago|reply
Both 4) and 5) are very important, but often forgotten.

Even for my own personal (small) projects, i've gotten hit with problems when i take an extended leave of absence and then return to a project, only to find my dependencies have become either completely outdated and unusable, or the repo was entirely deleted (with only a remote copy to build with).

I've since adopted the "fork" method; the dependency's source is forked (privately), and get the dependency built; this is recursively done with their dependencies (stopping at the language level libraries), just to get familiar with their build chain. Only then, will i feel good enough to add the dependency to my project (and use their publicly released artefacts from their publicly hosted library repository). It's a bit of work, but i think this effort is spent upfront, and removes future effort if the project lives long enough to see the ecosystem move/change directions (and you dont want to follow).

Sometimes i do find that source libraries (rather than pre-built binaries) to be better in this sense. I merely need to fork and link the sources into the project, rather than have to learn anything about build chains of dependencies...

[+] hahn-kev|8 months ago|reply
For #4, while I agree sometimes, I also recognize that SQLite (and other similar types of dependencies) are 100% going to outlive the product I'm building. So it's a bit arrogant to say my product is going to outlive them. I'm not going to build the Linux kernel just because it's a dependency.
[+] friendzis|8 months ago|reply
4 and 5. Not at least once, all of your code should be at the very least buildable with a network cable yoinked out, preferably without any binary artifacts, but that is not always possible.
[+] Cthulhu_|8 months ago|reply
With 5, forking may be a bit excessive, as keeping your fork up to date adds a big maintenance burden. Vendoring dependencies (just pushing them into git, git LFS, or operating a caching proxy) is valid though, especially for long-lived software projects. Maybe less so for NodeJS based ones as their dependencies use many small files, but there's Yarn and maybe PNPM that will keep dependencies in more convenient archive files instead.
[+] devjab|8 months ago|reply
Being in the energy sector dependencies is something we intentionally avoid because we'd actually have to go through and review changes. What has helped this immensely is AI code assistance. One of the primary uses is to use it to write CLI tools for code and config generation in the tool chain. All of it is around making your life easier, without pulling external dependencies.

An example is that we create openapi docs with LLM's. Then we serve them with Go's net/http + FileServer, meaning that we never leave the standard library. Of course the LLM itself is a third party dependency, but when you use it to write CLI tools that then do the code generation, it never actually sees the code. That also means that the quality of those CLI tools are less important, because it is their output that matters.

It only goes so long of course. I'm not sure any of our front-end engineers would want to live in a world where they weren't allowed to use React, but then, their products are treated as though they are external anyway.

Anyway, it's a lot easier to make engineers stop using "quality of life" dependencies when you give them something that still makes their lives easy.

[+] barisozmen|8 months ago|reply
Completely agree. It's one of the most important skills to know which dependency is good and which is bad.

My two cents. If a dependency is paid, than it is usually bad. Because the company providing that dependency has an incentive to lock you in.

As another point, "dependency minimalism" is a nice name for it. https://x.com/VitalikButerin/status/1880324753170256005

[+] ChrisMarshallNY|8 months ago|reply
Well, as a battle-scarred old war horse, I can say that "It Depends™."

I have found that "It Depends™" is a mantra for almost everything that I do, and experience has given me the ability to choose a fork in the road, when an "It Depends™" moment happens.

When I was younger, my world was governed by "hard and fast" rules. There could be no exceptions, and some of the worst code I wrote, consisted of the gymnastics required to shoehorn an unsuitable library/paradigm into my projects.

I tend to use a lot of self-developed dependencies. They are really distillations of things that I do frequently, so I factor them into standalone packages. That way, I don't have to reinvent the wheel, and I'm confident of the Quality and provenance.

But if there's functionality that is required, and beyond me, for whatever reason, I am always open to including dependencies, as long as I can reconcile myself to their quality and other issues.

[+] lmm|8 months ago|reply
People love to claim this, especially on this site, but in my experience it's the opposite. Many people like writing new code and will do it even when it's detrimental to the business, but 9 times out of 10 even using a "bad" dependency is far more effective than writing in-house.
[+] friendzis|8 months ago|reply
Dependencies are a double-edged sword.

Most vocal people work on "disposable" end of software. It's cheaper for software giants to just throw engineer-hours at rewriting piece of code that has fallen into organizational obscurity than to maintain (hehe) maintainability. There is usually no sense for small branding/webshop factories to churn out high quality, maintainable code.

However, I suggest you revisit the reason why the dreaded "enterprise patterns" exist in the first place. The main reason to use these architectures is so that five years down the line, when documentation is badly outdated, there is no organizational memory left behind that component, original developers have transitioned to either different teams/departments or left the company altogether, the component is still well isolated, analyzable and possible to work on.

Introduction of external dependency(-ies) carry two inherent business risks: either support for dependency will be dropped, meaning you will have to either absorb maintenance burden yourself or switch dependencies, or it will introduce breaking changes, meaning you have to stick to unmaintained version or update your product code. Both situations will eventually impact your feature flow, whatever it is.

Compromise between trunk and leaf (push of dependencies vs pull of deps) is intrinsic to modularization and is always there, however with internal components this compromise is internal, rather external.

> Many people like writing new code and will do it even when it's detrimental to the business, but 9 times out of 10 even using a "bad" dependency is far more effective than writing in-house.

If you are a SaaS company - most probably yes as it is the short-term outcome that is determinate of business success. However, if you work in any industry with safety and support requirements on software or put the burden of long term support on yourself, long-term horizon is more indicative of business success.

Remember, writing new code is almost never the bottleneck in any mature-ish organization.

[+] freetime2|8 months ago|reply
Can I ask how seriously your company takes security vulnerabilities and licensing? I used to have a pretty lax attitude toward dependencies, but that changed significantly when I joined a company that takes those things very seriously.
[+] Cthulhu_|8 months ago|reply
There's also the distinction between a library - a tool that does one particular thing and does it well - and a framework - which dictates how you structure your entire application. And this is a mindset as well.

I mainly know the original quote in the context of Go, where the community as a whole (or at least the part that is in the Slack server) eschews big frameworks in favor of the standard library, lightweight architectures, and specialized libraries. A lot of people coming from other languages come into the communities and ask about whether there's anything like Spring for Go. Another common request is help with a major framework or abstraction like Gin (web API framework) or GORM (database ORM). But Gin is no longer really necessary as the SDK has good support for the common web API use cases, and GORM quickly breaks down as soon as you need anything more advanced than querying a single table, not to mention it's big and complicated but has only one lead developer (I believe), no major organization behind it.

But Gin and GORM dictate how you structure at least part of your application; especially the latter adds a layer of indirection so you no longer have to ask "how to I join two tables in SQL" but "how do I join two tables in GORM", the latter of which becomes very specialized very fast.

Anyway. Using the standard library is preferable, using small and focused libraries for specific problems that aren't solved by the standard library is also fine. Copy / pasting only the functions you need is also fine.

[+] Tractor8626|8 months ago|reply
You can tell it is written by C programmer because they think installing dependencies is hard.
[+] JonChesterfield|8 months ago|reply
C programmers don't install dependencies. That would be insane. Better to "vendor" them into the same tree as everything else and build them at the same time.
[+] mrheosuper|8 months ago|reply
if it's not hard why do we have docker ?
[+] zdw|8 months ago|reply
The ubiquity criteria also informs scaling - for example, if a tooling or backend dependency is being production deployed by a company at a scale of 10^2 or 10^3 times your use case, you're much less likely to hit insurmountable performance issues until you get to similar scale.

They're also much more likely to find/fix bugs affecting that scale earlier than you do, and many companies are motivated to upstream those fixes.

[+] srcreigh|8 months ago|reply
Their libraries sometimes don’t even work for low scale though.

The protocol buffer compiler for Swift actually at one point crashed on unexpected fields. Defeating the entire point of protos. The issue happens when only it tries to deserialize from JSON, which I guess none of them actually use due to large scale.

[+] moron4hire|8 months ago|reply
I don't agree.

Some of the worst bugs I've hit have been in libraries written by very large companies, supposedly "the best and brightest" (Meta, Google, Microsoft, in that order) but it takes forever for them to respond to issues.

Some issues go on for years. I've spent months working in issue trackers discussing PRs and whether or not we can convince some rules-lawyer it doesn't warrant a spec change (HINT: you never convince him), chasing that "it's faster/cheaper/easier to use a 3rd party package" dragon, only to eventually give up, write my own solution, fix the core issue, and do it in less time than I've already wasted. And probably improve overall performance while I'm at it.

I think a lot of it depends on the exact field you're working in. If you're working in anything sniffing close to consulting, work is a constant deluge of cockamamie requests from clients who don't understand they aren't paying you enough to throw together a PhD research thesis in a month with a revolving crew of junior developers you can't grow and keep because the consulting firm won't hire enough people with any amount of experience to give the small handful of senior developers they keep dragging into every God damned meeting in the building so we can have a chance to come up for air every once in a while.

I'm at a point where I have enough confidence in my skills as a software developer that I know pretty much for certain whether I can develop a given solution. There are very few I can't. I'm not going to try to train an AI model on my own. I won't try to make my own browser. A relational database with all the ACID trimmings, no.

But I'll definitely bang out my own agentic system running off of local inference engines. I'll for sure implement an offline HTML rendering engine for the specific reports I'm trying export to an image. I'll build a fugging graph database from scratch because apparently nobody can make one that I can convince anyone to pay for (budget: $0) that doesn't shit the bed once a week.

Most of the time, the clients say they want innovation, but what they really want is risk reduction. They wouldn't hire a consultant if it wasn't about risk, they'd put together their own team and drive forward. Being broadly capable and well-studied, while I may not be quite as fast at building that graph database example as an expert in Neo4j or whatever, we're also not getting that person and have no idea when they are showing up. If they even exist in the company, they're busy on other projects in a completely different business unit (probably not even doing that, probably stuck in meetings).

But I know I can get it done in a way that fits the schedule. Spending time reading the worst documentation known to mankind (Google's) because some drive-by said they did this once and used a Google product to do it is probably going to end in wasting you a lot of time only to realize that said drive-by didn't spend long enough actually listening to the meeting to understand the nuance of the problem. Time that you could have spent building your own and hitting your schedule with certainty.

Sorry, it's late and I'm tired from a full quarter of 12 hour days trying to rescue a project that the previous team did nothing on for the previous quarter because... IDK why. No adults in the room.

[+] compiler-guy|8 months ago|reply
This echoes the Joel On Software Classic, "In Defense of Not-Invented-Here-Syndrome", which gives some of the reasons Microsoft Excel won the spreadsheet wars. Of particular note:

=-=-=

“The Excel development team will never accept it,” he said. “You know their motto? ‘Find the dependencies — and eliminate them.’ They’ll never go for something with so many dependencies.”

https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-...

[+] barisozmen|8 months ago|reply
"One technique for making software more robust is to minimize what your software depends on – the less that can go wrong, the less that will go wrong. Minimizing dependencies is more nuanced than just not depending on System X or System Y, but also includes minimizing dependencies on features of systems you are using."

From http://nathanmarz.com/blog/principles-of-software-engineerin...

[+] 0x000xca0xfe|8 months ago|reply
NIH can also be great if you only need a subset of a "mature" dependency but you need to really nail it.

Since it is a solved problem you'll likely find ample documentation, algorithms and code examples so there is much less that can go wrong than in a greenfield project.

At my last job we rewrote InfluxDB as it as was too resource-hungry for our embedded device and the result was fantastic. 10x the time series storage at a fraction of the memory and CPU utilization, much more robust and flash-friendly.

[+] tacker2000|8 months ago|reply
So after 2 mins i figured out what NIH means.

Would be nice to at least explain it once at the beginning thought.

There are many articles that use acronyms and not everyone knows WTH is going on.

[+] a3w|8 months ago|reply
"New, in-house" development? Would be my guess. Still left guessing.
[+] phplovesong|8 months ago|reply
Database drivers, crypto etc im always "installing", but for 97% of other stuff i tend to roll my own. And when i dont want to reinvent the wheel, i take time to vet the depency.

How many LOC?

Does it has deps of its own?

Is it a maintained library?

Can i fork/fix bugs?

[+] jeffhwang|8 months ago|reply
And here I thought I was going to read a novel defense of the British healthcare system.
[+] silisili|8 months ago|reply
> Sometimes it's easier to write it yourself than install the dependency

This is definitely true, but only really relevant if you're either a solo dev or in for the long haul, so to speak. And it'll work way better for your use case too.

The problem is working with others. Others coming in are more apt to try to learn a known or semi-known system than "Dave who used to work here's crazy thoughts." Especially in this market where longevity is rare, and company specific knowledge is useless as resume fodder.

So from a dev standpoint it absolutely makes sense. From a business standpoint, probably markedly less so.

[+] midasz|8 months ago|reply
Either way you wrap each thing that acts as a dependency, even if it's internal. I treat dependencies another team in my company delivers the same as any other third party dependency. Never use the classes but always just wrap around them in a service or component.

When my 'task' is to publish things on a Kafka bus I create a publishing service that takes in an object that I control, only inside that service is there actual talk about the dependency and preferably even that's a bit wrapped. It's easy to go too far with this but a little bit of wrapping keeps you safe.

[+] fireattack|8 months ago|reply
What's NIH
[+] rjh29|8 months ago|reply
Short for Not Invented Here syndrome... when developers ignore existing libraries or solutions and implement their own version. Usually this is meant negatively, i.e. developers thinking they -need- to implement their own version, at great expense, when they could have just used an off-the-shelf library. However this article is positing that writing your own code is a good thing in some cases.