top | item 34274609

Ask HN: Contingency plans for a lead dev no longer avail?

37 points| uticus | 3 years ago | reply

What are typical ways larger projects handle a situation where a person closely associated to the project is no longer available?

For example, Linus for Linux, or Guido for Python, if they were suddenly and unexpectedly unable to perform their duties.

47 comments

order
[+] ritzaco|3 years ago|reply
The graveyard is full of indispensable people. I felt indispensable at my last job - company is still alive years later, and I wish I had left sooner. It's often hard to have perspective if you are The Person, or rely on The Person, but if the projects are important (Linux) they will evolve some kind of continuity and, if they aren't, they will die and other projects will replace them.

If you care a lot about a specific project and want to prepare then it's similar to preparing for any other disaster.

* Documentation and written processes for everything. People like to feel irreplaceable, but it's far more valuable to make yourself unnecessary. In high stress situations, people make bad decisions, so having well-oiled play books is important. E.g. see [0].

* Practice and simulations: at large companies people are sometimes put on surprised forced vacation and are not allowed to communicate with their team.

* Foster a culture of encouraging people to step out of their box - if Important People's decisions can be questioned, then the rest of the team will better understand them and will be able to cope better with making them if necessary

[0] https://www.atlasobscura.com/articles/pointing-and-calling-j...

[+] londons_explore|3 years ago|reply
> Documentation and written processes for everything.

There are upsides and downsides to this. Foster a culture of 'document everything', and you'll quickly end up with a corporation which runs like molasses as everyone has to spend time writing up meeting minutes and spends 90% of their time updating documentation and arguing over if something is allowed within the XYZ procedure.

In my orgs, I generally say "a 1 pager quickstart guide on how each subsystem works, and everything else is docstrings and comments in the code".

If the system is too complex for that to be enough for a new engineer to understand whats going on, then your system should be redesigned to be simpler. We're designing apps and web services. If it's complicated, then you're doing it wrong.

[+] pestatije|3 years ago|reply
This is termed "the bus factor", and the best way to deal with it is prevention, or "increase the bus factor":

https://en.wikipedia.org/wiki/Bus_factor

[+] ChrisMarshallNY|3 years ago|reply
The only issue that I see with the Tech Industry obsession with "bus factor," is that companies insist on replacing "rockstar" developers, with really (and I mean really) low-skill developers. i.e., Linus needs to be replaceable by a JS programmer, right out of bootcamp.

This leads to truly awful codebases, "dumbed down" to the lowest common denominator. I have had people tell me not to use advanced Swift capabilities, because "a JavaScript programmer couldn't understand it." I'm a fairly advanced Swiftie, but not as leet, as some.

One of the really nice things about my current [unpaid] gig, is no one telling me that my code is "too complicated," or "too strange."

Yeah, it's strange. Learn to live with it...

[+] Tarragon|3 years ago|reply
I have been hit by a bus. Survived obviously.

At work I've been added a 2nd technical contact and security contact for several projects because "you're less likely to be hit again". I think, but I'm not 100% sure, that is being done in a joking manner.

[+] uticus|3 years ago|reply
Tried really hard to avoid the term "hit by a bus" but yes this is exactly what is meant.

But looking for something more specific - like, how is that specifically planned out in a project? How does being "open source" affect the plan, vs closed?

Or even, what is the specific plan for if Linus gets hit by a bus? Hold a vote? Heir-apparent, decided by Linus himself? Split the responsibilities evenly?

Also, to ask from another viewpoint, what historical case studies are there? Seems like in the open-source community at least everyone is young enough that this hasn't been much of an issue yet.

[+] itsmemattchung|3 years ago|reply
You'd be surprised to see how often this occurs. Sometimes, for piloted projects, especially during the prototyping phase, a single point of failure—or, as pestatije pointed out, "bus factor" — sometimes inevitable. So you either run the risk of that person being no longer available, or, even during the initial stages (which can be expensive in human resources), partner up, right from the beginning.
[+] emidln|3 years ago|reply
I've also heard "Key Person Risk" recently for the same concept.
[+] jesuscript|3 years ago|reply
The corporate term is "healthy redundancy".
[+] londons_explore|3 years ago|reply
People tend to be a lot less indispensable than they appear.

Ask a small dev team "what would happen if the main technical person left suddenly?", and they'll normally tell you the companies product would collapse.

But when that happens, the usual outcome of a key person leaving a small team is usually that the teams productivity is reduced for a few months while someone new is found and fills the space.

[+] pdimitar|3 years ago|reply
Encourage all other contributors to take ownership of subsystems. I haven't found a better way so far. Empower people to be more courageous and trust them (after a few initial and super heavy PR reviews, of course).

Basically, decentralization. There can still be the Dev X who knows it all but everyone else should be able to maintain and evolve various pieces of the system should the need arise.

[+] wreath|3 years ago|reply
This.

I spent plenty of time making sure there is enough documentation and attempting to get feedback on it, the project roadmap, milestones/schedule etc with a team that had almost zero engagement. Once I left, they weren't able to put things together despite a clear todo list of rewiring services to finish the project. it doesn't matter what architecture, amount of documentation, testing, how clean the code is etc, if your team is not engaged and doesn't share the same vision of where the project should go, the project is doomed to fail once you leave.

[+] ilaksh|3 years ago|reply
"Perform their duties" -- i.e. wage slave who is actually currently indispensable. A significant portion of the time the situation is actually that they should be a cofounder but you are taking advantage of them and treating them like an normal employee. If you want to do that, then you will need to quickly find some more programmers you can take advantage of and have them train up on the code base and domain before the first guy finds a better job or starts his own business. Hopefully he starts a competing business and buries you.
[+] donkeyd|3 years ago|reply
At my start-up I got pretty much no time to train my replacement before I left. I was the technical founder / lead dev / software architect. Pretty much the entire application came from and lived in my brain.

They're still up and running and doing really well.

Edit: So it doesn't seem like this is because I documented everything well and code was good. That wasn't the case. They just managed.

[+] rqtwteye|3 years ago|reply
Linus will be a tough one. I have never seen somebody who is able to play the role of benevolent dictator for so long.
[+] ghaff|3 years ago|reply
While wishing Linus the best of heath etc., it's probably one of the projects I worry about the least. It's technically a benevolent dictator model. But as others have mentioned there are a bunch of other maintainers--most notably Greg KH but there are a lot of other senior people heavily involved. It's also under a foundation and, not to put too fine a point on it, a lot of corporations with a lot of money are heavily involved.
[+] axegon_|3 years ago|reply
Greg had to take over for short periods in the past and did an excellent job. You could argue that he took over when most of the stuff was already set in stone but still, he did a good job. That said, I'm not sure if he can or would be even willing to do that in the long run.
[+] pipingdog|3 years ago|reply
Don't be indispensable, that's irresponsible.
[+] aynyc|3 years ago|reply
I unfortunately have been in this multiple times in my career. My experience, everyone just grind it out for a few months of less productivity and eventually we'll ship something. Life goes on. All the suggestions for documentation, bus factor, and everyone owns are all good, but I don't think anyone actually practice that, at least not in my 20 year career.
[+] chasd00|3 years ago|reply
On every project I’ve been on every role has a backfill. The backfill is not usually as competent (expensive) as the primary but if the primary gets hit by a bus the backfill can keep the show going. Yes, you better have a way to login to all systems the primary is logging in too as well. You don’t want your backfills first question to be “ok, so what’s root on this box?”
[+] neilv|3 years ago|reply
For the benevolent dictator, they surround themselves with like-minded lieutenants (not the coup d'etat-inclined, but mutual respect and trust kind).

But for software engineering in general, you want a culture of people focused on working as a team towards the mission, not on individual metrics/career. This affects what you do, how you to it, how you document it, how you make decisions, etc. If there's no immediate plan for what to do when the lead dev not available, they'll figure it out.

[+] neilv|3 years ago|reply
But, if you don't have trust in your engineering/product org (e.g., working as a team rather than for individual metrics is seen as counterproductive for individual career, execs not trusted, documenting is seen as making you expendable in headcount reduction, etc.)... then either fix your org culture or move to somewhere where you can have a better culture, IMHO.

(And if SHTF before you fix culture or move, then the question of succession for a lead dev can just be bumped up the management hierarchy at that time. And the interim solution can be creative based on the details of the immediate situation. For example, there might be an obvious successor, based on their background, skills, and availability at that moment, and the nature of that might determine whether to be thinking of them as interim, provisional, or long-term.)

[+] ROTMetro|3 years ago|reply
My experience is mainly the 90s in the Bay Area, but we always were expected to be training our replacement and we were always being trained up if we wanted to be. I've ran my teams this way and it's seemed to work fine except for the dead wood employees that I probably should have fired that I was handholding/micromanaging to try and help grow. Their responsibilities, when I was suddenly gone, were the most visible things that went to crap for my previously prepared replacement.
[+] WJW|3 years ago|reply
Both Linux and Python would be absolutely fine if any of them disappeared tomorrow. There is no part of those projects that only they know about and there are more people with release rights than just the BDFL. It's usually the small projects that tend to die as soon as the main contributor vanishes, because they don't have the critical mass of people available to have redundancy.
[+] nindalf|3 years ago|reply
> Linus for Linux

Responsibility for various sub-systems is already distributed to many individuals. As for overall responsibility to maintain the canonical tree and merge patches in, it'll probably similar to the last time Linus stepped down for personal reasons. Greg Kroah-Hartman handled the 4.19 release of Linux while Linus took a break.

[+] axegon_|3 years ago|reply
Tough call, I guess it kinda depends. To an extent we saw that with Rust when Graydon Hoare decided he didn't want to lead the development and left it to the community. And the community did a pretty good job(despite some pretty valid criticisms). Mind you, Rust was nowhere nearly as mature as python or linux so...
[+] gonzo41|3 years ago|reply
Take the most complex thing that person does, and take it away from them. Task two or three capable but junior people that task and accept they might stuff it up. Try to create a low pressure environment. However, they will learn a lot and you'll de risk your key person problem.
[+] nradov|3 years ago|reply
Python doesn't depend on Guido van Rossum anymore. He is still highly influential but the governing organization could continue just fine without him.
[+] UncleEntity|3 years ago|reply
Guido is probably the best example since he stepped away and gave governance to the foundation(?).

Apparently not the walrus.

[+] uticus|3 years ago|reply
I would love to study how this was done. Seems kinda like making a will, in a way - what needs to be spelled out, spell it out. Consider everything else to happen in a possibility against your wishes.

Any good refs or documentation available on what governance the foundation was given at the time he stepped down?

[+] ilc|3 years ago|reply
People step up at moments like this.

As a Lead:

When my wife had heart problems, I instantly told my team to go on auto-pilot, who would run their ceremonies, and went to support my wife. When I got called by someone with a question that was answerable by my team, I chewed them out. This is all with 100% support from my director.

As a developer:

I'm working on a product where the product has been developed by one dev largely for many years. I'm at the 6 month mark now, and honestly, if I had to step up to support everything myself, it'd suck, but the company would keep going. Would I have to change how a few things work? Probably. But would we get our releases out: Absolutely. Usually at 6 months, I could take over most products but this one is very specialized. (Which is why I joined up. I knew the ramp was longer, and I had more growth. :) )

And that is the real story: If you want to get that "truck number" up. Get someone else into your codebase, and working. If not expect some lag while they come on-line, but that the world will keep spinning.

... Also look for people who can learn, over just raw skills in a replacement scenario. Clearly they need to know enough, but it may take you longer to find someone who can check every box, than get someone and have them ramp up to "good enough" quickly, missing a piece or two. Heck you might want 2-3 developers.. something about learning your lesson.

[+] aliqot|3 years ago|reply
> When I got called by someone with a question that was answerable by my team, I chewed them out.

Yikes. A trivial work issue is something someone might use an excuse to check on you and make sure your wife is okay and that you're alright.

[+] sdze|3 years ago|reply
Documentation as part of Definition of Done?