top | item 9874503

What Is the Truck Factor of Popular GitHub Applications?

73 points| adamnemecek | 10 years ago |mtov.github.io

32 comments

order
[+] NathanKP|10 years ago|reply
The algorithm is inherently flawed because it is based purely on authorship of files, and not the quality of the documentation, source code, source code comments, etc.

It doesn't matter if a piece of software was written by only one person. If it has great documentation then the original author can die (or less morbidly just abandon it) and someone else can pick it up easily.

All the node.js modules by TJ Holowaychuk are a good example. They are extremely well written and documented, allowing them to continue being worked on by alternate maintainers even though he has moved away from the Node.js ecosystem to Go.

[+] 5outh|10 years ago|reply
Case in point: Express is now owned by strongloop and is still (no pun intended) going strong.
[+] hlieberman|10 years ago|reply
This analysis seems to me to be deeply flawed. At only a glance, at least two of the projects mentioned at TF=2 have entire companies that are dedicated to them: ansible and elastisearch.

According to Crunchbase, Elastic.co has between 51 and 200 employees, and has raised over $104MM in venture capital. According to Github, there are at least 27 employees of Ansible, Inc. that have commit access to github. Even if we assume that many of these people are not "core developers", saying that the "truck-factor" is /2/ is so far from accurate as to go from merely wrong to straight-up deceitful.

[+] mpdehaan2|10 years ago|reply
It seems bogus, yes.

Since you mentioned a project I started - ansible, and I am no longer with the company and stopped working on the project as a result of that, you could consider that me one of those truck hit.

While the number of Ansible company folks on the core team is about 3 or 4 now (much of the company is devoted to Ansible Tower as a product), I think ansible's truck factor is about infinite though, because the code isn't complicated, and any user could pick it up, the code is pretty safe from any event. The license is stuck as GPL due to no copyright assignment (100% intentional!), though 'ansible' in context is trademarked. So people shouldn't be worried. The code will live, because it's basic stuff that hundreds of python developers can get their heads around (and do, based on a lot of long-tail contributions). So, unless I'm full of it, it was designed to survive ANYTHING, on purpose, and especially sought out making the code non-clever, and make it easy enough for someone to hack where they needed to extend or diverge from what we did with it.

Code that has a liability based on a disappearing author is usually the complex code, in a specific problem domain, or something complex and poorly documented or poorly commented, or built with proprietary tools. Ansible is none of those things. If it were deep AI code in C or Intercal attached to a domain that had nothing to do with AI and uncommonly used C or Intercal in that field, maybe!

And if something is useful enough enough, it can pretty much surive anything - or at least be easily replaced.

That being said, there's reason to worry about dependencies - but I think those are seldom because of the factors in this article. Here's a post on some of those I wrote recently:

http://opsrevolution.com/its-3am-do-you-know-where-your-soft...

TLDR - The only TF=2 we should be worried about is probably Team Fortress related. If you must be worried, be worried when projects have small numbers of contributors and you don't understand the code - and don't think anybody else also using the project would pick it up. If you think there's 1000 people who understand it and can pick it up, it's definitely not TF=2 and this whole article is bad math.

[+] hobarrera|10 years ago|reply
> This analysis seems to me to be deeply flawed. At only a glance, at least two of the projects mentioned at TF=2 have entire companies that are dedicated to them: ansible and elastisearch.

Even if there's an entire company behind it, that doesn't mean that everybody at that company is an expert in that software. Maybe just one or two devs had a good understanding on how it works, and the rest work on different things, document, or maybe just know it superficially.

[+] kabouseng|10 years ago|reply
Why do the author call it truck factor. Even the wikipedia link in the article calls it bus factor.
[+] mikekchar|10 years ago|reply
"Truck factor" was popular in some circles many years ago. Given the number of replies that say that they have never heard of "Truck factor", I'm guessing that google won't help me out here. I certainly heard the term "truck factor" long before "bus factor".

My understanding of the reason it changed to "bus" was that people in the UK (who use the term "lorry" for what North Americans call a "truck") found the term awkward. That's purely anecdotal, though. It's entirely possible I was in a bubble of "truck factor" usage.

[+] forgottenpass|10 years ago|reply
Interchangeable terms. One is more popular than another.
[+] emn13|10 years ago|reply
Another smell: projects with small files will appear to have a low truck factor; whereas those that group code into very large files (edited by multiple people) will tend to have a very high truck factor.

Then theres the fact that irrespective of code quality, not all code bit-rots equally. Some code needs frequent updates to stay with times; some needs minor touches to track small api changes over the years, and some code is pretty much set in stone - it's virtually never touched except for languages changes.

This algorithm isn't measuring the truck factor.

[+] matthewbauer|10 years ago|reply
This is definitely an interesting metric. It would be useful to look at this when deciding on using a dependency in a project.

I suspect that the Homebrew project is a little skewed because recipes are checked into the repository. Although these are technically source code, they aren't really part of the core of what is necessary to understand the code behind Homebrew's source.

[+] lemevi|10 years ago|reply
I'm pretty sure actual HomeBrew's bus-factor is 1, it's just 1 guy working on core HomeBrew.
[+] caf|10 years ago|reply
They're all skewed, in one way or another. By number of source files, Linux is mostly drivers - but you don't need to be an expert in the kernel core to be able to write a driver.
[+] Baghard|10 years ago|reply
As a developer is it a good or a bad thing to have a high Truck Factor? There seems to be a trade-off. Management and business wants low Truck Factors, but talented developers do not want to be replaceable code monkeys. Do projects with a low Truck Factor lose specialist knowledge?

Then I wonder when the Truck Factor applies and if you would always want to lower it. For a few projects I worked on invoking this Truck Factor irked me. A low Truck Factor is insulting to your skills (for you ten others), but a high Truck Factor can be too. Who likes to hear that management is already planning to continue your work after you are found to be roadkill?

I don't even want to hear this metric in a start-up, because how would you even optimize this metric? Be glad you have something worth dying over.

[+] ufmace|10 years ago|reply
It's a little dicey. Consider also that the developer who can't be fired also can't be promoted. Do you want to be the coder who made something so essential yet incomprehensible that nobody dares to fire you or promote you? Or do you want to be the coder who built something that's awesome and essential, but also simple enough to understand that anybody can maintain it, so you're free to move on to better opportunities at will? There's always somebody willing to employ the coder who can do the latter.

Then again, there are always people who have so much specialized domain knowledge and experience that it just can't be represented with any amount of good architecture, comments, and documentation. If you're in that position, do the best you can to make things clear, and try to be reasonably helpful in finding and training a replacement when the time comes.

[+] dj-wonk|10 years ago|reply
I see the points you are making.

Here's another way to look at it. If you care about an open source project, you don't want it to fail if you get run over or pulled away from it.

[+] peterhajas|10 years ago|reply
This algorithm assumes that a user is a person, so some of its matches are incorrect:

- atom - caskroom - mongoid - celery - etc.

[+] sz4kerto|10 years ago|reply
Try to guess the truck factor for Java/OpenJDK or .Net, and you'll immediately find one reason why enterprises go with that and not ... dunno, grunt.js.
[+] adamnemecek|10 years ago|reply
Trick question, for enterprise projects it's called "Maybach factor".
[+] lamontcg|10 years ago|reply
Another flaw with this algorithm is that it doesn't take into account if the author is current or not. You could have a TF of 8 but 6 of those are devs who have left the project, so your actual TF is 2, but its probably worse than that because a good chunk of the codebase is old with no current maintainer.