The algorithm is inherently flawed because it is based purely on authorship of files, and not the quality of the documentation, source code, source code comments, etc.
It doesn't matter if a piece of software was written by only one person. If it has great documentation then the original author can die (or less morbidly just abandon it) and someone else can pick it up easily.
All the node.js modules by TJ Holowaychuk are a good example. They are extremely well written and documented, allowing them to continue being worked on by alternate maintainers even though he has moved away from the Node.js ecosystem to Go.
This analysis seems to me to be deeply flawed. At only a glance, at least two of the projects mentioned at TF=2 have entire companies that are dedicated to them: ansible and elastisearch.
According to Crunchbase, Elastic.co has between 51 and 200 employees, and has raised over $104MM in venture capital. According to Github, there are at least 27 employees of Ansible, Inc. that have commit access to github. Even if we assume that many of these people are not "core developers", saying that the "truck-factor" is /2/ is so far from accurate as to go from merely wrong to straight-up deceitful.
Since you mentioned a project I started - ansible, and I am no longer with the company and stopped working on the project as a result of that, you could consider that me one of those truck hit.
While the number of Ansible company folks on the core team is about 3 or 4 now (much of the company is devoted to Ansible Tower as a product), I think ansible's truck factor is about infinite though, because the code isn't complicated, and any user could pick it up, the code is pretty safe from any event. The license is stuck as GPL due to no copyright assignment (100% intentional!), though 'ansible' in context is trademarked. So people shouldn't be worried. The code will live, because it's basic stuff that hundreds of python developers can get their heads around (and do, based on a lot of long-tail contributions). So, unless I'm full of it, it was designed to survive ANYTHING, on purpose, and especially sought out making the code non-clever, and make it easy enough for someone to hack where they needed to extend or diverge from what we did with it.
Code that has a liability based on a disappearing author is usually the complex code, in a specific problem domain, or something complex and poorly documented or poorly commented, or built with proprietary tools. Ansible is none of those things. If it were deep AI code in C or Intercal attached to a domain that had nothing to do with AI and uncommonly used C or Intercal in that field, maybe!
And if something is useful enough enough, it can pretty much surive anything - or at least be easily replaced.
That being said, there's reason to worry about dependencies - but I think those are seldom because of the factors in this article. Here's a post on some of those I wrote recently:
TLDR - The only TF=2 we should be worried about is probably Team Fortress related. If you must be worried, be worried when projects have small numbers of contributors and you don't understand the code - and don't think anybody else also using the project would pick it up. If you think there's 1000 people who understand it and can pick it up, it's definitely not TF=2 and this whole article is bad math.
> This analysis seems to me to be deeply flawed. At only a glance, at least two of the projects mentioned at TF=2 have entire companies that are dedicated to them: ansible and elastisearch.
Even if there's an entire company behind it, that doesn't mean that everybody at that company is an expert in that software. Maybe just one or two devs had a good understanding on how it works, and the rest work on different things, document, or maybe just know it superficially.
"Truck factor" was popular in some circles many years ago. Given the number of replies that say that they have never heard of "Truck factor", I'm guessing that google won't help me out here. I certainly heard the term "truck factor" long before "bus factor".
My understanding of the reason it changed to "bus" was that people in the UK (who use the term "lorry" for what North Americans call a "truck") found the term awkward. That's purely anecdotal, though. It's entirely possible I was in a bubble of "truck factor" usage.
Another smell: projects with small files will appear to have a low truck factor; whereas those that group code into very large files (edited by multiple people) will tend to have a very high truck factor.
Then theres the fact that irrespective of code quality, not all code bit-rots equally. Some code needs frequent updates to stay with times; some needs minor touches to track small api changes over the years, and some code is pretty much set in stone - it's virtually never touched except for languages changes.
This is definitely an interesting metric. It would be useful to look at this when deciding on using a dependency in a project.
I suspect that the Homebrew project is a little skewed because recipes are checked into the repository. Although these are technically source code, they aren't really part of the core of what is necessary to understand the code behind Homebrew's source.
They're all skewed, in one way or another. By number of source files, Linux is mostly drivers - but you don't need to be an expert in the kernel core to be able to write a driver.
As a developer is it a good or a bad thing to have a high Truck Factor? There seems to be a trade-off. Management and business wants low Truck Factors, but talented developers do not want to be replaceable code monkeys. Do projects with a low Truck Factor lose specialist knowledge?
Then I wonder when the Truck Factor applies and if you would always want to lower it. For a few projects I worked on invoking this Truck Factor irked me. A low Truck Factor is insulting to your skills (for you ten others), but a high Truck Factor can be too. Who likes to hear that management is already planning to continue your work after you are found to be roadkill?
I don't even want to hear this metric in a start-up, because how would you even optimize this metric? Be glad you have something worth dying over.
It's a little dicey. Consider also that the developer who can't be fired also can't be promoted. Do you want to be the coder who made something so essential yet incomprehensible that nobody dares to fire you or promote you? Or do you want to be the coder who built something that's awesome and essential, but also simple enough to understand that anybody can maintain it, so you're free to move on to better opportunities at will? There's always somebody willing to employ the coder who can do the latter.
Then again, there are always people who have so much specialized domain knowledge and experience that it just can't be represented with any amount of good architecture, comments, and documentation. If you're in that position, do the best you can to make things clear, and try to be reasonably helpful in finding and training a replacement when the time comes.
Please take a look at the Linux Foundation's just published census of the security risks of open source projects. Truck factor would be a great addition.
Try to guess the truck factor for Java/OpenJDK or .Net, and you'll immediately find one reason why enterprises go with that and not ... dunno, grunt.js.
Another flaw with this algorithm is that it doesn't take into account if the author is current or not. You could have a TF of 8 but 6 of those are devs who have left the project, so your actual TF is 2, but its probably worse than that because a good chunk of the codebase is old with no current maintainer.
[+] [-] NathanKP|10 years ago|reply
It doesn't matter if a piece of software was written by only one person. If it has great documentation then the original author can die (or less morbidly just abandon it) and someone else can pick it up easily.
All the node.js modules by TJ Holowaychuk are a good example. They are extremely well written and documented, allowing them to continue being worked on by alternate maintainers even though he has moved away from the Node.js ecosystem to Go.
[+] [-] 5outh|10 years ago|reply
[+] [-] hlieberman|10 years ago|reply
According to Crunchbase, Elastic.co has between 51 and 200 employees, and has raised over $104MM in venture capital. According to Github, there are at least 27 employees of Ansible, Inc. that have commit access to github. Even if we assume that many of these people are not "core developers", saying that the "truck-factor" is /2/ is so far from accurate as to go from merely wrong to straight-up deceitful.
[+] [-] mpdehaan2|10 years ago|reply
Since you mentioned a project I started - ansible, and I am no longer with the company and stopped working on the project as a result of that, you could consider that me one of those truck hit.
While the number of Ansible company folks on the core team is about 3 or 4 now (much of the company is devoted to Ansible Tower as a product), I think ansible's truck factor is about infinite though, because the code isn't complicated, and any user could pick it up, the code is pretty safe from any event. The license is stuck as GPL due to no copyright assignment (100% intentional!), though 'ansible' in context is trademarked. So people shouldn't be worried. The code will live, because it's basic stuff that hundreds of python developers can get their heads around (and do, based on a lot of long-tail contributions). So, unless I'm full of it, it was designed to survive ANYTHING, on purpose, and especially sought out making the code non-clever, and make it easy enough for someone to hack where they needed to extend or diverge from what we did with it.
Code that has a liability based on a disappearing author is usually the complex code, in a specific problem domain, or something complex and poorly documented or poorly commented, or built with proprietary tools. Ansible is none of those things. If it were deep AI code in C or Intercal attached to a domain that had nothing to do with AI and uncommonly used C or Intercal in that field, maybe!
And if something is useful enough enough, it can pretty much surive anything - or at least be easily replaced.
That being said, there's reason to worry about dependencies - but I think those are seldom because of the factors in this article. Here's a post on some of those I wrote recently:
http://opsrevolution.com/its-3am-do-you-know-where-your-soft...
TLDR - The only TF=2 we should be worried about is probably Team Fortress related. If you must be worried, be worried when projects have small numbers of contributors and you don't understand the code - and don't think anybody else also using the project would pick it up. If you think there's 1000 people who understand it and can pick it up, it's definitely not TF=2 and this whole article is bad math.
[+] [-] hobarrera|10 years ago|reply
Even if there's an entire company behind it, that doesn't mean that everybody at that company is an expert in that software. Maybe just one or two devs had a good understanding on how it works, and the rest work on different things, document, or maybe just know it superficially.
[+] [-] kabouseng|10 years ago|reply
[+] [-] mikekchar|10 years ago|reply
My understanding of the reason it changed to "bus" was that people in the UK (who use the term "lorry" for what North Americans call a "truck") found the term awkward. That's purely anecdotal, though. It's entirely possible I was in a bubble of "truck factor" usage.
[+] [-] forgottenpass|10 years ago|reply
[+] [-] emn13|10 years ago|reply
Then theres the fact that irrespective of code quality, not all code bit-rots equally. Some code needs frequent updates to stay with times; some needs minor touches to track small api changes over the years, and some code is pretty much set in stone - it's virtually never touched except for languages changes.
This algorithm isn't measuring the truck factor.
[+] [-] matthewbauer|10 years ago|reply
I suspect that the Homebrew project is a little skewed because recipes are checked into the repository. Although these are technically source code, they aren't really part of the core of what is necessary to understand the code behind Homebrew's source.
[+] [-] lemevi|10 years ago|reply
[+] [-] caf|10 years ago|reply
[+] [-] Baghard|10 years ago|reply
Then I wonder when the Truck Factor applies and if you would always want to lower it. For a few projects I worked on invoking this Truck Factor irked me. A low Truck Factor is insulting to your skills (for you ten others), but a high Truck Factor can be too. Who likes to hear that management is already planning to continue your work after you are found to be roadkill?
I don't even want to hear this metric in a start-up, because how would you even optimize this metric? Be glad you have something worth dying over.
[+] [-] ufmace|10 years ago|reply
Then again, there are always people who have so much specialized domain knowledge and experience that it just can't be represented with any amount of good architecture, comments, and documentation. If you're in that position, do the best you can to make things clear, and try to be reasonably helpful in finding and training a replacement when the time comes.
[+] [-] dj-wonk|10 years ago|reply
Here's another way to look at it. If you care about an open source project, you don't want it to fail if you get run over or pulled away from it.
[+] [-] lemevi|10 years ago|reply
[+] [-] gerbal|10 years ago|reply
[1] https://github.com/JetBrains/intellij-community/commit/5b9f9...
[+] [-] peterhajas|10 years ago|reply
- atom - caskroom - mongoid - celery - etc.
[+] [-] dankohn1|10 years ago|reply
https://github.com/linuxfoundation/cii-census
[+] [-] andrewnez|10 years ago|reply
[+] [-] raziel2p|10 years ago|reply
[+] [-] sz4kerto|10 years ago|reply
[+] [-] adamnemecek|10 years ago|reply
[+] [-] lamontcg|10 years ago|reply