The Human Cost of Tech Debt

[+] ThrowMeAway314|9 years ago|reply

I recently became CTO of a company with two moderately successful SaaS products. The second one is a fork of the first, but has been maintained by more competent people than the original product. Both are still monuments to technical debt.

About 4 million lines of PHP code, written by underpaid, sometimes not well meaning, freelancers and students over the span of 8 years. The CEO wrote a large part, but stopped learning new techniques around 2004.

I'm bringing competent and well paid people in through my network and try my very best to give them as much freedom as possible, I allow and encourage greenfield modules/services that run on separate and new infrastructure for anything that is possible to be rewritten in the timeframes, but the larger part of the job is still mind numbing to my team and that makes me question my wisdom.

If anyone has tips on how to steer such a ship in a direction where the work is less frustrating for my devs I'm very open for advice.

[+] foz|9 years ago|reply

I've run across this situation many times (I'm a "senior" team lead, meaning, I've been working for 25+ years). I've witnessed companies that overcame tech debt, and seen companies fail because of it. There's basically three approaches that people can take.

One is the "big re-write". They start on a new code base, and try to develop it in parallel. It takes a very long time, and the teams have to work on two solutions for some time. It's a big bang approach, and it often fails, or drags on for years.

The second is massive refactoring. It requires intensive testing and best practices. This strategy requires that the teams focus on testing intensely. However, often the testing culture is not there, which is why the code became unmanageable in the first place. It's kind of like starting over. And the new focus and discipline on testing is hard for teams to do without strong leadership, training, or new talent.

The last, and most effective in my opinion, is to go with a service-based, incremental approach. If the code base is not already using services, APIs must be built. Frontend/apps must be de-coupled from the legacy components. A clear domain model has to be agreed upon, and then parts of the legacy codebase are put behind APIs, and de-coupled from the rest. Over time, sections are refactored independently, and the APIs can hide legacy away. Maybe the legacy parts are refactored and replaced, or they stay around for a while. But the key is, that this approach allows multiple people or teams to work in parallel and focus on their areas. This is domain-driven design in action, and it works. New features can actually be developed sooner, even though the legacy is not replaced yet.

In the end, overcoming tech debt is about people. And on larger code bases, it's more of an organizational problem than a code problem. Developers need to be able to move forward without having to navigate too much code or too many different people.

[+] whack|9 years ago|reply

I think one important step is to recognize that bugs can and will happen in the process of cleaning up this tech debt. There are 2 things you need to do, to handle this.

1. Tell your team that cleaning up the tech debt is a major company priority, and even if some mistakes are made along the way, that's an acceptable price to pay for the benefits involved. People shouldn't let the fear of breaking something, dissuade them from cleaning up the mess.

2. Have your team invest heavily on building test/QA infrastructure, so that if they were to break something, it would get automatically caught and flagged, before it reaches production. If the advice given in bullet 1 scares you, then you need to double down here to make up for it.

I've been in teams where pull requests that significantly cleaned up the code base were literally rejected, because people were paranoid that any change at all could break something in unforeseen ways. People just bunkered up into a "if it ain't broke, don't touch it" mentality, which meant that the tech debt problem never ever improved. Ultimately, the only way to get yourself out of this hole, is by encouraging people to take risks even if it involves them sometimes failing, and building better safety nets to catch them on the occasions when they do fail.

[+] jestar_jokin|9 years ago|reply

Check out the book "Working Effectively with Legacy Code", by Michael Feathers[0].

I believe the basic approach is to write tests to capture the current behaviour at the system boundaries - for a web application, this might take the form of automated end-to-end tests (Selenium WebDriver) - then, progressively refactor and unit test components and code paths. By the end of the process, you'll end up with a comprehensive regression suite, giving developers the confidence to make changes with impunity - whether that's refactoring to eliminate more technical debt and speed up development, or adding features to fulfill business needs.

This way, you can take a gradual, iterative approach to cleaning up the system, which should boost morale (a little bit of progress made every iteration), and minimises risk (you're not replacing an entire system at once).

I've used this approach to rewrite a Node.js API that was tightly coupled to MongoDB, and migrated it to PostgreSQL.

[0] https://www.amazon.com/Working-Effectively-Legacy-Michael-Fe...

[+] mrweasel|9 years ago|reply

The key point for me is the fact that you're allowing your developers to actually fix the issues. Most developers I know love improving old code, even if the end-users will never notice.

Personally I don't mind having to deal with the technical debt, but I absolutely detest being forced into technical debit. This is especially true if I have not yet been allowed to deal with what ever debt that already exists in a given system.

The worst thing to hear in a meeting with manager or project-leads is "We'll deal with 'that' issue after launch" or "Yeah, we need to hit the deadline, so let's just get this thing working, and deal with the fallout later". Dealing with problems retroactively is always going to be more expensive and some problems simply aren't fixable after a product launch. If you're going into product with known bugs or defects, at least let the technical people choose which bugs.

[+] eyan|9 years ago|reply

Refactor. Joel says it far better: http://www.joelonsoftware.com/articles/fog0000000069.html

I see sibling comments on big rewrites. No. I see sibling comments on service based replacements. Still no.

Put structure in. Add tests. Rinse. Repeat.

[+] eru|9 years ago|reply

Have you looked into Facebook's Hack? From what I've seen, Facebook created the language to help with exactly your kind of PHP technical debt problem.

That's a technical thing you can try, in addition to any of the social approaches the other comments are suggesting.

Using Hack makes the incremental approach more bearable on your people, and bear more fruit than just staying in plain PHP.

[+] flukus|9 years ago|reply

Make sure there is time dedicated to cleaning up the bad stuff. All the platitudes about code quality are worthless without dedicating time to improve things.

[+] adam77|9 years ago|reply

have the team come up with a list of achievable meaningful milestones (eg. 'eliminate use of nasty obsolete library X'), ensure some time is spared to progress them; it'll become clear if the team is net paying off or accruing

also, find someone who thrives on eliminating crap and let them get stuck in

[+] jonaf|9 years ago|reply

This is a tough spot to be in. From the perspective of a tech lead / individual contributor (I don't consider myself in the company of a Fellow or C-level exec), I've witnessed this kind of situation before and learned a few lessons from it, which I'd like to share here. (Bear in mind, I use the phrase "lessons learned" here loosely, as I could have drawn incorrect conclusions from my experiences, so please pay more attention to the explanations of the points more than the takeaways.)

- Don't bite off more than you can chew. You could "replatform" and try to replace everything that currently works with new tech. In my experience, while the result is an admirable amount of sophisticated technology, the value for the business is unrealizable for a significant amount of time (which usually results in "bad things," like your stock going down, employees being unhappy/leaving due to thinking that it's not going to work out, sales not hitting targets because they don't believe in the product they're selling, customers having more "strength" during contract/sales negotiations, etc). I have observed ~4 years of significant (maybe over a hundred engineers) investment in an effort to replace the entire technology of a relatively young company. I have read many stories of companies doing this and going belly-up as what appears to be a direct result (few companies survive the process; Uber would be an example of a company that is doing this and will survive[1] -- Steve Blank has some particularly appropriate reading material[2]). The problem is the business must continue to grow and sell its product during this time (stable business is important, but growth is critical -- and you can't focus on growth when you're rewriting your technology from the ground up). This seems obvious, but the moment I hear someone say "greenfield" when they also have significant tech debt, I raise an eyebrow of suspicion.

- So, following that, do bite off very small chunks and slowly decompose your tech debt into whatever your well-paid, highly-competent team lead(s) recommend in terms of architecture. For example, if you have a huge legacy application, slowly separate each logical component into its own microservice (assuming your team leads believe microservices are the best architecture for your use-case, etc).

- Freelancers/students/interns/contractors are great! But don't let them design anything. I say this not as a jab to anyone in this category. I work with people in these categories daily. However, it's critical that their work is only implementation and that it is written in a way that has absolute minimal cognitive overhead. The reason is because if you hire someone temporarily to produce a hoozit that does Thing, then in the absolute best case, they will produce exactly that, but you (and your well-paid, highly-competent staff) will have absolutely no idea or understanding as to how or why the hoozit does Thing, or how to modify the hoozit to be a whatsit, or make the hoozit do Otherthing. Sure, you could figure it out. But I posit the cost of doing so is greater than the cost of having done it yourself, even if it takes longer. And now that I think of it, this point is supported strongly by your very own experience already: temp workers always produce technical debt, even in the absolute best case (an example of a worse case is that they produce an unmaintainable/incomprehensible/unmodifiable hoozit that does not do Thing or only does Thing in some conditions a.k.a. being riddled with bugs). Competency or compensation has little or no effect. My belief is that this is often because temp workers know their position is temporary and will strive to achieve exactly Thing when building your hoozit -- they are economically motivated to do so. They are not economically motivated to build your hoozit in such a way that it can become a whatsit later on.

- Remember to deliver value -- in particular, drive growth. This is super important in the technology industry. You should always have some amount of your engineering muscle focused on delivering new value. Sometimes that is reducing technical debt, or decomposing your legacy application, or building new products. Sometimes it's integrating a third-party's product with yours in some way, as "uncool" as that may sound. Sometimes you are in a bind when you can't produce new value without dealing with technical debt. So deal with the technical debt in the most sensible way (i.e., not producing additional technical debt, but also not foregoing any work to reduce your technical debt in the pure interest of improving the top line). If you think of your legacy application as delivering some value, but you don't ever add new features to it, consider your competition and whether your business will lose critically due to lack of innovation.

- I believe that the best engineers care a lot more about understanding the business and how their work aligns with it. If you ask your team, I expect they will tell you that they would rather work on the mind-numbing effort, in some way, if it is the best thing for the business. Junior software developers will always prefer greenfield. More senior software developers will seek the optimal solution. (Similarly to how junior engineers will prefer Shiny New Tech X, whereas senior engineers will only use Shiny New Tech X when it really, really makes sense to do so and there is a strong alignment with the business and/or low risk to doing so -- like in a new product/microservice that is less critical than your core product offerings, for instance.)

- Don't fall into the trap of believing that the only way to grow the business is to abandon/migrate away from legacy applications. At the least, move your users/customers as you decompose/replace legacy software/services. Avoid trying to make big migrations, especially wholesale. (OK, maybe I'm repeating my first point here...)

Separately, I'm curious why the CEO having written any of the code matters if he is no longer contributing code (I'm assuming he is not, since you mentioned he stopped learning new techniques). Or did you mean that he stopped having free time to learn new techniques and therefore stopped contributing code? I would think the CEO needs to stop contributing code as soon as you have enough engineers to meet your minimum desired sprint points (or whatever metric you use for productivity). I certainly don't think that not having learned new techniques necessitates the cessation of contribution, although learning new techniques is a likely byproduct of contribution (due to experience, research, implementation itself, code reads, etc). But, I'm digressing from the topic.

Anyway, hopefully my commentary/experience is helpful to you. Best of luck!

[1] https://eng.uber.com/soa/

[2] https://steveblank.com/2011/01/25/startup-suicide-%E2%80%93-...

[+] nicobn|9 years ago|reply

I've been in this situation many times. Hit me up at [my username] @ gmail.com if you want to talk.

[+] unknown|9 years ago|reply

[deleted]

[+] jonaf|9 years ago|reply

[deleted]

[+] blunte|9 years ago|reply

"They know that they’re going to have to manufacture endless explanations for why seemingly simple things take them a long time." This is what kills me. You can't say to the boss, "Your beloved senior dev built this arcane and fragile system, so everything I do takes forever." Instead you have to find diplomatic/meaningless explanations for why you're moving so slowly.

[+] quantumhobbit|9 years ago|reply

I've found that I can win over my immediate superiors with low level grumbling about real technical issues in the code base. They eventually internalize the state of the code base and have realistic expectations for how long things will take. The key is to diplomatically voice all your WTF moments, and have a boss technical enough to get it.

Problem is their bosses still think that the spaghettified joke that is the internal framework we have to use is manna from heaven and would take a very dim view of anyone caught trashing it in the open.

[+] bradleyjg|9 years ago|reply

From the bosses' point of view it is very difficult to tell the difference between "we have a crap codebase" and "the new guy sucks at reading code and is going to want to rewrite everything he touches".

[+] michaelfeathers|9 years ago|reply

I'm convinced that the issue overall is lack of transparency about design quality. In an org with non-technical leadership, technical staff should periodically present assessments of their systems' readiness for change. Non-tech people need to understand technical debt and see how it changes over time.

In my experience, companies with technical founders tend to have cultures that handle this issue better.

[+] simula67|9 years ago|reply

I wonder if it is also the case that if the code quality is high, people are happy and they stay. This means fewer openings are available in those teams. Therefore, if you take a new job, it is more probable that the code is bad and there is a revolving door of developers who have tried to make it better, failed, left and created an opening for you. The vicious cycle of bad jobs.

[+] 0xfeba|9 years ago|reply

Personal Anecdote:

Was employed, but searching for a new job. Interviewed a few places that required coding on whiteboards (ugh). Then interviewed at a place where they asked me only a few technical questions, but mainly about what I did at my previous position. Then one of the technical questions was what did HTML5 change about using <b> tags and such. And I said, uh, just use CSS. No, "HTML5 introduced <strong> for better semantics". Didn't disagree at the time, just said, "Interesting.", and looked it up later. Yeah, the strong tag has been there since at least HTML3.2

Whatever, they offered me nearly doubly what I was earning before, plus a sizable bonus. I figured it couldn't be that bad.

Well, it's bad. 4 people quit in my first week (out of, say, 20-25 devs). One was one of the persons who interviewed me. Months later, about 5 more people have left and been replaced. We've also added new members. My project isn't a trainwreck, but the main headache is the main project with a 30 year old codebase.

It's a mess. They are trying to revamp it in situ, but it's ASP and SQL and most of the business logic is in SQL stored procedures. They've really done some nice code* in the revamp but all it does is add layers on top of an eventual SP call.

So my project is humming along nicely, it's scheduled for a few years and I'm only going to be here a couple years anyway, so it's a nice step in my career to me. I just got lucky. I'd hate to work on the main product. Evidently everyone else does to because they keep churning through people.

*They don't believe in comments. The methods and stuff are clearly written, eg. CreateUserAndReturnUserRef and verbose stuff like that, but for some reason they don't have comments. Guess what other code doesn't have comments? The classic ASP and store procedures. Great habit to keep! Even my new project lacks comments. I add them, but no one else does. It's very odd. I've brought it up and gotten various excuses ("Comments rot, so we don't use them," "I just don't have time for comments" "Yeah I should add those")

[+] Fuxy|9 years ago|reply

As I currently work at a company in this exact situation I can confirm it is very true.

And to validate the article even more I am currently searching for better opportunities where I can work with better and more modern technologies.

[+] whack|9 years ago|reply

Honestly, I would get bored working at a place where everything is "easy". I actually enjoy the challenge of working on awful code-bases, and trying to improve upon it, while still preserving the functionality and not breaking anything.

That said, if I had to work with awful code and I wasn't allowed to touch it or make it better, I would definitely leave in a heartbeat.

[+] jcbeard|9 years ago|reply

Bad management, oversight, and architecting returns bad code. Often team leads know better, but are driven by deadlines and the lack of a skilled workforce (with little time to train them) to produce a product. Of course this leads to turn-over...eventually. People will work for a year, realize that nothing they want to accomplish will happen quickly because of resourcing, then they'll run away. I really don't agree that the human cost of technical debt is a new concept. It's foremost in my mind when I think about giving tools to employees to add new features. Well written article, but not really anything new.

[+] flukus|9 years ago|reply

Stuff like this is why I wish you could review the companies code before joining. Getting hired, discovering how bad everything is and then starting the job hunt again is awful for everyone.

[+] brandall10|9 years ago|reply

For most job offers I've gotten I've asked to review the source code post-offer. I've even done this in a couple interviews -- can't recall ever being denied.

In one particular instance I came back to review the code for a product that required a level 3 security clearance; the hiring manager happily complied, letting me spend over 2 hours alone studying the code on a workstation in a side office. One reason I passed on that position was due to the code being more spaghetti than I would have preferred.

[+] peterbotond|9 years ago|reply

...then the companies having bad code will make a few good ones for show. Interview is like first date both parties try to impress without mentioning the critical (i could have said important too) parts.

[+] bunderbunder|9 years ago|reply

You should insist on a chance to speak to your potential colleagues without anyone listening over your shoulder.

[+] caente|9 years ago|reply

Copying my comment from the OP

> Analysts and project managers might account for technical debt when discussing slipped deadlines.

No they won't, you cannot quantify technical debt, it's a metaphor.

I completely agree with the rest of the post. But "technical debt" is not a cause, nor a valid metaphor to share with management. Other bad metaphors are "building" and "architecture". We are not building anything, we are only writing algorithms that computers will follow. When we write down those algorithms, we are also encoding the inner workings of the company. The true knowledge of how a company works is not only in its documentation or in the heads of their employees, it's also in its automated processes.

The automated processes are the reason why a company can be competitive nowadays. People need to be able to understand those processes, those algorithms, from the code, documentation is never enough, and is never up to date. The code is the primary source of information. If it's not legible, if it becomes arcane knowledge, a black magic that only a consultant can "fix", then the company grinds to a halt, it withers and is crushed by competitors. Be careless about the code quality is not about the suffering of the developers, it means that you are burying vital knowledge of your company, and eventually no employee, no consultant, will be able to dig it out.

If we are trying to figure out how to deal with managers who don't care about the company, but only about absurd deadlines that their managers gave them, then we need to start pushing back, start saying NO. Politics are unavoidable, and often necessary, not all decisions can be made because technical reasons, but we need to start imposing the reality of the code, you cannot rush it, our you are killing yourself.

[+] ddebernardy|9 years ago|reply

> For a manager, a code base high in technical debt means that feature delivery slows to a crawl, which creates a lot of frustration and awkward moments in conversation about business capability. For a developer, this frustration is even more acute. Nobody likes working with a significant handicap and being unproductive day after day, and that is exactly what this sort of codebase means for developers.

The very same holds for managers and teams in companies that need to deal with organizational debt:

https://steveblank.com/2015/05/19/organizational-debt-is-lik...

[+] FollowSteph3|9 years ago|reply

The concept of technical debt is easy to convey to management, the hard part is conveying the scale of the debt because it's an abstract value. You don't have x units of debt. And because there's no actual value to measure it's easy to keep adding it on, it's easily viewed as the same debt.

To give you an analogy imagine you have a credit card. As you spend you increase your debt but you never get to know want your balance is. All you know is how much you pay per month. In fact even that number would be blurry and fluctuate. That's what management sees from their perspective. As a result it's easy to keep adding debt because it had no real value, you don't see that you're $1000, $10k, $100k, etc in debt, just that you have to pay something each month. Yes each month you have less to spend but it's a lot less than you spent that month. Last month you had to pay $100 but this month it's $101 and you don't have to pay that $50 one Ike hit. The extra $1mth is easier. You kinda forget that in a year it's now an extra $2-$3/mth. And again you never see your balance, so you don't know how much you have in debt. Your spouse keeps saying your way in debt but you have no idea of the scale. It's very easy to increase your debt this way.

Unfortunately I don't have an answer as to how you can value the technical debt so that the business people can appreciate the scale. I don't think as developers we can even measure it ourselves accurately, it's more of a feeling, a scale if you will.

[+] sigsergv|9 years ago|reply

Most articles about technical debt should actually be about technical default. Tech debt is a good technical management instrument, it's just need to be paid and everyone involved must understand that. Delaying payments is as bad as delaying regular finance debt payments.

[+] techterrier|9 years ago|reply

Completely agree with this sentiment. Not all tech debt is bad - and it's often inevitable. I wrote some stuff about this notion: https://medium.com/@MostlyHarmlessD/on-technical-debt-7bac65...

[+] coderot|9 years ago|reply

This article hits close to home!

What should be done in a company that's basically committed every error you could commit? What's an engineer to do? Just switch jobs?

I joined a company a few years ago that had hit a wall with its tech stack after 8 years. They brought in a new technical lead who convinced everyone to approve a "rewrite everything from scratch on new stack" approach (even though it rarely works).

2 Years and $40-50m later:

- The thing's nowhere near done. Customers all still using old legacy product that's not been updated in 2-3 years now since resources were diverted to rewrite

- 88% of the eng org has turned over

- current eng org doesn't know how 20-40% of the codebase works

- customers have caught on and started leaving en masse

- Rest of company has turned over as well as they realized nothing useful was going to come out of engineering for years more if ever

It's a bit confusing. What the heck should we do at this point?

I take some solace in knowing that I've seen firsthand what NOT to do in situations like this.

[+] flukus|9 years ago|reply

> What should be done in a company that's basically committed every error you could commit? What's an engineer to do? Just switch jobs?

Sometimes that's all you have the power to do. Why are customers leaving en masse if the old product is still functioning?

It sounds like the rewrite got stuck in the pattern where it has to do 100% of the original and then some. Just like a new product, a rewrite has to hit that MVP mark, just not as urgently. They also have to be able to throw out the cruft and not import every bad idea from v1.

[+] lmm|9 years ago|reply

Heh I worked at a company that did something like that.

Tell the people above the tech lead, politely, that the failure is a predictable consequence of the decision to rewrite from scratch, and that what they need to do is abandon the rewrite and pour resources into the "legacy" version to fix it. And be prepared to switch jobs. Not necessarily in that order.

[+] danieltillett|9 years ago|reply

Fire the tech lead for one if he has not been already.

[+] clifanatic|9 years ago|reply

> Just switch jobs?

Meet the new job, same as the old job.

[+] boothead|9 years ago|reply

This is a great article. I've also seen tech debt exacerbated by poorly implemented agile practices. One of the worst projects I worked on kept kicking the can down the road because of a misguided idea that everything should be a customer deliverable. So not only were we forced to work with crap, but we were disempowered from making it any better! It was a nightmare, motivation and team cohesion was non-existent and yes, turnover was very high!

[+] clifanatic|9 years ago|reply

> everything should be a customer deliverable

And everything should be delivered in a 100% predictable timeframe and that predictable timeframe should be about a day.

[+] gearhart|9 years ago|reply

> To tie this back to the tech debt metaphor, think of someone with mountains of debt trying to explain being harassed by creditors. It’s embarrassing, which is, in turn, demoralizing.

Seems that a more apt use of the analogy would be "think of someone with mountains of debt trying to explain why they can't afford to do the same things as their colleagues"

[+] unknown|9 years ago|reply

[deleted]

[+] sigsergv|9 years ago|reply

The human cost of unpaid tech debt.

[+] ubercode5|9 years ago|reply

"If you’re not already familiar with the concept of technical debt" ohh how I envy thee.

[+] clifanatic|9 years ago|reply

As much as I love the metaphor, it's been quite a while since I first noticed the "eye-twitching" among non-technical management types whenever the term "technical debt" comes up. I can almost hear them thinking, "oh, boy, more spoiled programmer bullshit for me to deal with".

94 comments