Mistakes engineers make in large established codebases

[+] peterldowns|1 year ago|reply

I agree that consistency is important — but what about when the existing codebase is already inconsistent? Even worse, what if the existing codebase is both inconsistent and the "right way to do things" is undocumented? That's much closer to what I've experienced when joining companies with lots of existing code.

In this scenario, I've found that the only productive way forward is to do the best job you can, in your own isolated code, and share loudly and frequently why you're doing things your new different way. Write your code to be re-used and shared. Write docs for it. Explain why it's the correct approach. Ask for feedback from the wider engineering org (although don't block on it if they're not directly involved with your work.) You'll quickly find out if other engineers agree that your approach is better. If it's actually better, others will start following your lead. If it's not, you'll be able to adjust.

Of course, when working in the existing code, try to be as locally consistent as possible with the surrounding code, even if it's terrible. I like to think of this as "getting in and out" as quickly as possible.

If you encounter particularly sticky/unhelpful/reticent team members, it can help to remind them that (a) the existing code is worse than what you're writing, (b) there is no documented pattern that you're breaking, (c) your work is an experiment and you will later revise it. Often asking them to simply document the convention that you are supposedly breaking is enough to get them to go away, since they won't bother to spend the effort.

[+] sfink|1 year ago|reply

Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms. In that case, if the current way is undocumented and/or inconsistent, you make it better before or while adding in your new approach. If there are 4 ways to do the same thing and you really want to do it a different way, then replace one of those ways with your new one in the process of adding it. For extra credit, get to the point where you understand why you can't replace the other 3. (Or if you can, do it! Preferably in followups, to avoid bundling too much risk at once.)

A lot of inconsistency is the result of unwillingness to fix other people's stuff. If your way is better, trust people to see it when applied to their own code. They probably have similar grievances, but it has never been a priority to fix. If you're willing to spend the time and energy, there's a good chance they'll be willing to accept the results even if it does cause some churn and require new learning.

(Source: I have worked on Firefox for a decade now, which fits the criteria in the article, and sweeping changes that affect the entire codebase are relatively common. People here are more likely to encourage such thinking than to shoot it down because it is new or different than the status quo. You just can't be an ass about it and ignore legitimate objections. It is still a giant legacy codebase with ancient warts, but I mostly see technical or logistical obstacles to cleaning things up, not political ones.)

[+] digging|1 year ago|reply

> If it's actually better, others will start following your lead.

Not really my experience in teams that create inconsistent, undocumented codebases... but you might get 1 or 2 converts.

[+] agentultra|1 year ago|reply

> In this scenario, I've found that the only productive way forward is to do the best job you can, in your own isolated code, and share loudly and frequently why you're doing things your new different way.

Now you have N+1 ways.

It can work if you manage to get a majority of a team to support your efforts, create good interfaces into the legacy code paths, and most importantly: write meaningful and useful integration tests against that interface.

Michael Feathers wrote a wonderful book about this called, Working Effectively with Legacy Code.

I think what the author is trying to say with consistency is to avoid adding even more paths, layers, and indirection in an already untested and difficult code base.

Work strategically, methodically, and communicate well as you say and it can be a real source of progress with an existing system.

[+] mgfist|1 year ago|reply

I rarely see large 10m+ LOC codebases with any sort of strong consistency. There are always flavors of implementations and patterns all over the place. Hell, it's common to see some functionality implemented multiple times in different places

[+] LouisSayers|1 year ago|reply

> but what about when the existing codebase is already inconsistent?

Then you get people together to agree what consistent looks like.

I find the easiest way to do this is to borrow someone else's publicly documented coding conventions e.g. Company ABC.

Then anyone disagreeing isn't disagreeing with you, they're disagreeing with Company ABC, and they (and you) just have to suck it up.

From there on in, you add linting tools, PR checks etc for any new code that comes in.

[+] HideousKojima|1 year ago|reply

My approach is what I call defensive programming, with a different meaning than the usual usage of the term. I assume that my coworkers are idiots that aren't going to read my documentation, so I make all public classes and methods etc. as idiot-proof as possible to use. Hasn't saved me from every issue caused by my teammates never reading my docs or asking me questions, but it's definitely prevented several.

[+] stravant|1 year ago|reply

Sometimes people are too afraid of attempting to make it consistent.

I've done several migrations of thing with dozens of unique bespoke usage patterns back to a nice consistent approach.

It sometimes takes a couple straight days of just raw focused code munging, and doesn't always end up being viable, but it's worth a shot for how much better a state it can leave things in.

[+] lamontcg|1 year ago|reply

It is terrible to just do this on your own, particularly as the n00b.

If there are 5 different standards in the codebase, don't just invent your own better way of doing things. That is literally the xkcd/Standards problem. Go find one of the people who have worked there the longest and ask which of the 5 existing standards are most modern and should be copied.

And as you get more experience with the codebase you can suggest updates to the best standard and evolve it. The problem is that you then need to own updating that whole standard across the entire codebase. That's the hard part.

If you aren't experienced enough with the codebase to be aggressive about standardization, you shouldn't be creating some little playground of your own.

[+] mihaaly|1 year ago|reply

Consistency in huge legacy codebase is like virginity in a brothel: desired but non-existent.

[+] whilenot-dev|1 year ago|reply

Consistency is never the sole reason to change something, there's always consistency and at least one of the following:

- coherency

- type safety (shout-out to dynamically typed languages that adapted static typing)

- concurrency

- simplicity

- (and more)

> If it's actually better, others will start following your lead.

A lot of people don't want to improve the quality in their output and for various reasons... some are happy to have something "to pay the bills", some don't want to use a programming language to its full extend, some have a deeply rooted paradigm that worked for 10 years already ("static types won't change that"), others are scared of concurrency etc. For some people there's nothing to worry about when a server can be blocked by a single request for 60 secs.

[+] dotancohen|1 year ago|reply

  > your work is an experiment and you will later revise it

I advise against this if you have not been allocated the time or budget to revise the code. For one thing, you're lying. For another thing, were you hired to be a part of the contributing team or hired to be part of a research team doing experiments in the contributing team's codebase and possibly deploying your experiment on their production systems?

I would immediately push back on any new guy who says this, no matter how confident he seems that his way is the right way.

[+] DrBazza|1 year ago|reply

> but what about when the existing codebase is already inconsistent

It depends.

If it's a self contained code base, automatic refactoring can safely and mechanically update code to be consistent (naming, and in some cases structurally).

If it's not self contained and you're shipping libraries, i.e. things depend on your code base, then it's more tricky, but not impossible.

"Lean on tooling" is the first thing you should do.

[+] mjr00|1 year ago|reply

> The other reason is that you cannot split up a large established codebase without first understanding it. I have seen large codebases successfully split up, but I have never seen that done by a team that wasn’t already fluent at shipping features inside the large codebase. You simply cannot redesign any non-trivial project (i.e. a project that makes real money) from first-principles.

This resonates. At one former company, there was a clear divide between the people working on the "legacy monolith" in PHP and the "scalable microservices" in Scala/Go. One new Scala team was tasked with extracting permissions management from the monolith into a separate service. Was estimated to take 6-9 months. 18 months later, project was canned without delivering anything. The team was starting from scratch and had no experience working with the current monolith permissions model and could not get it successfully integrated. Every time an integration was attempted they found a new edge case that was totally incompatible with the nice, "clean" model they had created with the new service.

[+] cratermoon|1 year ago|reply

I worked at a company that had a Rails monolith that underwent similar scenario. A new director of engineering brought in a half dozen or of his friends from his previous employer to write Scala. They formed up a clique and decide Things Were Going to Change. Some 18 months and 3 projects later, nothing they worked on was in production. Meanwhile the developer that was quietly doing ongoing maintenance on the monolith had gradually broken out some key performance-critical elements into Scala and migrated away from the Ruby code for those features. Not only had it gone into production, it made maintenance far easier.

[+] newaccountman2|1 year ago|reply

Am I naive for thinking that nothing like that should take as long as 6-9 months in the happy case and that it's absurd for it to not succeed at all?

[+] pablobaz|1 year ago|reply

In my experience with very large codebases, a common problem is devs trying to improve random things.

This is well intentioned. But in a large old codebase finding things to improve is trivial - there are thousands of them. Finding and judging which things to improve that will actually have a real positive impact is the real skill.

The terminal case of this is developers who in the midst of another task try improve one little bit but pulling on that thread leads to them attempting bigger and bigger fixes that are never completed.

Knowing what to fix and when to stop is invaluable.

[+] jimbokun|1 year ago|reply

Which can lead to trying to rewrite Netscape Navigator from scratch and killing the company:

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

[+] ninalanyon|1 year ago|reply

> common problem is devs trying to improve random things.

Been there, been guilty of that at the tail end of my working life. In my case, looking back, I think it was a sign of burnout and frustration at not being able to persuade people to make the larger changes that I felt were necessary.

[+] Kinrany|1 year ago|reply

Do you think boyscouting, "leave it better than you found it" is misguided as well?

[+] Animats|1 year ago|reply

> Single-digit million lines of code (~5M, let’s say)

> Somewhere between 100 and 1000 engineers working on the same codebase

> The first working version of the codebase is at least ten years old

That's 5,000 to 50,000 lines of code per engineer. Not understaffed. A worse problem is when you have that much code, but fewer people. Too few people for there to be someone who understands each part, and the original authors are long gone. Doing anything requires reverse engineering something. Learning the code base is time-consuming. It may be a year before someone new is productive.

Such a job may be a bad career move. You can spend a decade learning a one-off system, gaining skills useless in any other environment. Then it's hard to change jobs. Your resume has none of the current buzzwords. This helps the employer to keep salaries down.

[+] lmm|1 year ago|reply

I've worked in codebases like this and disagree. Consistency isn't the most important, making your little corner of the codebase nicer than the rest of it is fine, actually, and dependencies are great - especially as they're the easiest way to delete code (the article is right about the importance of that). What's sometimes called the "lava layer anti-pattern" is actually a perfectly good way of working, that tends to result in better systems than trying to maintain consistency. As Wall says, the three cardinal virtues of a programmer are laziness, impatience, and hubris; if you don't believe you can make this system better then why would you even be working on it?

Also if the system was actually capable of maintaining consistency then it would never have got that large in the first place. No-one's actual business problem takes 5M lines of code to describe, those 5M lines are mostly copy-paste "patterns" and repeated attempts to reimplement the same thing.

[+] jimbokun|1 year ago|reply

Pulling in lots of dependencies will eventually grind progress on features to a halt as you spend more and more time patching and deploying vulnerabilities. The probability of seeing new vulnerabilities I believe is pretty much linear in the number of dependencies you have.

[+] edanm|1 year ago|reply

> No-one's actual business problem takes 5M lines of code to describe, those 5M lines are mostly copy-paste "patterns" and repeated attempts to reimplement the same thing.

I'm pretty sure this is trivially untrue. Any OS is probably more than 5M lines (Linux - 27.8 lines according to a random Google Search). Facebook is probably more lines of code. Etc.

[+] TheBigSalad|1 year ago|reply

I work on a 5M+ line code base. It's not copy/paste or the same problems solved in different ways. It's a huge website with over 1K pages that does many, many things our various clients need.

[+] devnullbrain|1 year ago|reply

>making your little corner of the codebase nicer than the rest of it is fine, actually

As TFA points out, you might find out that you've made your little corner worse, actually.

[+] IvyMike|1 year ago|reply

The "The cardinal mistake is inconsistency" is 100% true. We used to call the guiding philosophy of working in these codebases "When in Rome".

[+] dav|1 year ago|reply

I have three maxims that basically power all my decisions as an engineer:

1. The three C’s: Clarity always, Consistency with determination, Concision when prudent. 2. Keep the pain in the right place. 3. Fight entropy!

So in the context of the main example in this article, I would say you can try to improve clarity by e.g. wrapping the existing auth code in something that looks nicer in the context of your new endpoint but try very hard to stay consistent for all the great reasons the article gives.

[+] Kon5ole|1 year ago|reply

Sometimes the right approach is to keep the consistency. Other times, that approach is either impossible or catastrophic.

IMO software development is so diverse and complex that universal truths are very very rare.

But to us programmers, anything that promises to simplify the neverending complexity is very tempting. We want to believe!

So we're often the equivalent of Mike Tyson reading a book by Tiger Woods as we look down a half-pipe for the first time. We've won before and read books by other winners, now we're surely ready for anything!

Which leads to relational data stored in couchDB, datalayers reimplemented as microservices, simple static sites hosted in kubernetes clusters, spending more time rewriting tests than new features, and so on.

IMO, most advice in software development should be presented as "here's a thing that might work sometimes".

[+] protonbob|1 year ago|reply

I don't have a real critique because I don't have that many years in a codebase the size of OP (just 2). But I struggle with the advice to not try and make a clean section of the code base that doesn't depend on the rest of the application.

Isn't part of good engineering trying to reduce your dependencies, even on yourself? In a latter part of the post, OP says to be careful tweaking existing code, because it can have unforeseen consequences. Isn't this the problem that having deep vertical slices of functionality tries to solve? High cohesion in that related code is grouped together, and low coupling in that you can add new code to your feature or modify it without worrying about breaking everyone else's code.

Does this high cohesion and low coupling just not really work at the scale that OP is talking about?

[+] gleenn|1 year ago|reply

It's one thing to reduce dependency and another to have reduced consistency. If you have 10 web routes and 1 behaves differently, it doesn't matter if the code is cross coupled or not, it matters if it behaves similarly. Does it return the same status codes on error? Does it always return JSON with error messages inside? Do you auth the same way? The implementation can be wholly separate but end users will notice because logic on their side now has to special-case your 11th endpoint because you returned HTTP 20x instead of 40x on error. Or when you realize that you want to refactor the code to DRY it (Don't Repeat Yourself), now you can't reduce all the duplication because you have bespoke parts.

[+] mbivert|1 year ago|reply

I think the gist of it is humility: as a newcomer, you don't really know what's out there and why, and there are often good reasons for things to be how they are. Not always, but often enough for avoiding being too original to be favored. This doesn't imply relinquishing on "good engineering habits" either.

Now, once you have a deeper understanding of the codebase, you'll know when and why to break away from existing patterns, but in the beginning phase, it's a good habit to start by learning carefully how things are designed and why.

[+] Salgat|1 year ago|reply

Consistency makes code predictable and reduces mental overhead. It doesn't mean you have to write it poorly like the rest of the codebase, but it does mean using the same general practices as the rest of the codebase. Think of it like using knockoff legos vs the real thing. They both work interchangeably which makes it easy to use them together, but you'd prefer to use the nicer lego pieces as much as possible in your builds because the material is higher quality, tighter tolerances, just overall works better even if it's the same shape as the knockoff pieces.

[+] Cthulhu_|1 year ago|reply

These are two different concepts though; reducing dependencies is good, but you can have minimal dependencies AND have the code look / feel / behave like the rest of the codebase. Always remember, it's not your code. Assume someone else will need to read / maintain it. Thousands might. You might have made the perfect section of code, then get an offer you can't refuse or get hit by a bus.

[+] mrkeen|1 year ago|reply

Nope, you've got it.

Code-consistency is a property just like any other property, e.g. correctness, efficiency, testability, modifiability, verifiability, platform-agnosticism. Does it beat any of the examples I happened to list? Not a chance.

> worrying about breaking everyone else's code

You already said it, but just to expand: if you already have feature A, you might succeed in plumbing feature B through feature A's guts. And so on with feature C and D. But now you can't change any of them in isolation. When you try to fix up the plumbing, you'll now break 4 features at once.

[+] hnanon98791|1 year ago|reply

> Single-digit million lines of code (~5M, let’s say)

> Somewhere between 100 and 1000 engineers working on the same codebase

> The first working version of the codebase is at least ten years old

> The cardinal mistake is inconsistency

Funny enough, the author notes the problem on why consistency is impossible in such a project and the proceeds to call it the cardinal mistake.

You cannot be consistent in a project of that size and scope. Full stop. Half those engineers will statistically be below average and constantly dragging the codebase towards their skill level each time they make a change. Technology changes a lot in ten years, people like to use new language features and frameworks.

And the final nail in the coffin: the limits of human cognition. To be consistent you must keep the standards in working memory. Do you think this is possible when the entire project is over a million LOC? Don't be silly.

There's a reason why big projects will always be big balls of mud. Embrace it. http://www.laputan.org/mud/

[+] maxwellg|1 year ago|reply

I never really understood putting consistency on a pedestal. It's certainly nice when everything operates exactly the same way - but consistency for consistency's sake is awful to work in too. If a team realizes that logging library B is better than library A, and but NEVER switches from A to B because of consistency concerns, then in two years they'll still all be using inferior tools and writing worse code. Similarly, if a team DOES decide to switch from A to B, they probably shouldn't spend months rewriting all previous code to use the new tool. It's ok for multiple established patterns to live in the same codebase, so long as everyone has an understanding of what the "correct" pattern should be for all new code.

[+] jakefromstatecs|1 year ago|reply

The consistency that they're referring to specifically is to do with consistency in the way that certain features or functionality is implemented.

To make your example match, it would be more so that there are two teams A and B, Team A already created a framework and integration for logging across the entire application. Team B comes along and doesn't realize that this framework exists, and also invents their own framework and integration for logging.

This is the type of consistency that the author points to, because Team B could have looked at other code already referencing and depending on the logging framework from Team A and they would have avoided the need to create their own.

[+] ted_bunny|1 year ago|reply

It's about minimizing cognitive load.

[+] lilyball|1 year ago|reply

"Consistency for consistency's sake" is usually a misinterpretation of "Consistency because there are reasons for the way things are already done and you don't understand those reasons well enough to diverge". If you understand the current system completely, then you can understand when to diverge from the system (though this is usually better expressed as "when to improve the system" rather than doing something completely new). If you don't understand the current system, then you can't possibly ensure that you haven't missed something in your shiny new way of doing things.

[+] 0xbadcafebee|1 year ago|reply

"Coding defensively" is perhaps the understatement of the year. Good software architecture is, in my opinion, the single most powerful tool you have to keep things from becoming an unmanageable mess.

If I could give one "defensive coding" tip, it would be for seniors doing the design to put in road blocks and make examples that prevent components from falling for common traps (interdependency, highly variant state, complex conditions, backwards-incompatibility, tight coupling, large scope, inter-dependent models, etc) so that humans don't have to remember to avoid those things. Make a list of things your team should never do and make them have a conversation with a senior if they want to do it anyway. Road blocks are good when they're blocking the way to the bad.

Starting with good design leads to continuing to follow good design. Not starting with good design leads to years of pain. Spend a lot more time on the design than you think you should.

[+] forty|1 year ago|reply

I have been working on a code base that is now 14 year old for many years (almost since the beginning), and is now well over 1M LoC of typescript (for Nodejs) - we are only 20-30 engineers working on it, rather than the 100-1000 suggested on the article. And I can say I couldn't agree more with the article.

If you have to work on such projects, there are two things to keep in mind: consistency and integration tests.

[+] jamesfinlayson|1 year ago|reply

> integration tests

Yes. I remember working in a 700,000+ line PHP code base that around 30% unit test coverage and an unknown percentage of e2e test coverage. I kept my changes very localised because it was a minefield.

Also, the unit tests didn't do teardown so adding a new unit test required you to slot it in with assertions accounting for the state of all tests run so far.

[+] zzbzq|1 year ago|reply

Wrong, wrong. Opposite of everything he said. All his examples are backwards. The article is basically inversing the Single Responsibility Principle.

First of all, consistency does not matter at all, ever. THat's his main thesis so it's already wrong. Furthermore, all his examples are backwards. If you didn't know the existence of "bot" users, you probably don't want your new auth mechanism to support them. Otherwise, the "nasty surprise" is the inverse of what he said: not that you find you don't support bot users, but you find out that you do.

Build stuff that does exactly what you want it to do, nothing more. This means doing the opposite of what he said. Do not re-use legacy code with overloaded meanings.

[+] adamc|1 year ago|reply

I really liked this: "as a general rule, large established codebases produce 90% of the value."

People see the ugliness -- because solving real problems, especially if business practices are involved, is often very messy -- but that's where the value is.

[+] o_nate|1 year ago|reply

A big part of this advice boils down to the old adage: "Don't remove a fence if you don't know why it was put there." In other words, when making changes, make sure you preserve every behavior of the old code, even things that seem unnecessary or counter-intuitive.

[+] sirmarksalot|1 year ago|reply

Consistency is often helpful, but you also need to be wary of cargo culting. For example, you see a server back end that uses an ORM model and you figure you'll implement your new feature using the same patterns you see there. Then a month later the author of the original code you cribbed comes by and asks you, "just out of curiosity, why did you feel the need to create five new database tables for your feature?"

I know, that's a pretty specific "hypothetical," but that experience taught me that copying for the sake of consistency only works if you actually understand what it is you're copying. And I was also lucky that the senior engineer was nice about it.

[+] ram_rar|1 year ago|reply

>The other reason is that you cannot split up a large established codebase without first understanding it. I have seen large codebases successfully split up, but I have never seen that done by a team that wasn’t already fluent at shipping features inside the large codebase

I cannot resonate with this. Having worked with multiple large code bases 5M+, splitting the codebase is usually a reflection of org structure and bifurcation of domain within eng orgs. While it may seem convoluted at first, its certainly doable and gets easier as you progress along. Also, code migrations of this magnitude is usually carried out by core platform oriented teams, that rarely ship customer-facing features.

[+] jongjong|1 year ago|reply

I'm now working on a codebase which is quite large (13 micro-services required to run the main product); all containerized to run on Kubernetes. The learning curve was quite steep but luckily, I was familiar with most of the tech so that made it easier (I guess that's why they hired me). The project has been around for over 10 years so it has a lot of legacy code and different repos have different code styles, engine versions and compatibility requirements.

The biggest challenge is that it used to be maintained by a large team and now there are just 2 developers. Also, the dev environment isn't fully automated so it takes like 20 minutes just to launch all the services locally for development. The pace of work means that automating this hasn't been a priority.

It's a weird experience working on such project because I know for a fact that it would be possible to create the entire project from scratch using only 1 to 3 services max and we would get much better performance, reliability, maintainability etc... But the company wouldn't be willing to foot the cost of a refactor so we have to move at steady snail's pace. The slow pace is because of the point mentioned in the article; the systems are all intertwined and you need to understand how they integrate with one another in order to make any change.

It's very common that something works locally but doesn't work when deployed to staging because things are complicated on the infrastructure side with firewall rules, integration with third-party services, build process, etc... Also, because there are so many repos with different coding styles and build requirements, it's hard to keep track of everything because some bug fixes or features I implement touch on like 4 different repos at the same time and because deployment isn't fully automated, it creates a lot of room for error... Common issues include forgetting to push one's changes or forgetting to make a PR on one of the repos. Or sometimes the PR for one of the repos was merged but not deployed... Or there was a config or build issue with one of the repos that was missed because it contained some code which did not meet the compatibility requirements of that repo...

[+] praptak|1 year ago|reply

Another post from the same author puts this in an interesting context: https://www.seangoedecke.com/glue-work-considered-harmful/ (follow up: https://www.seangoedecke.com/cynicism/)

Keeping the code base tidy is glue work, so you should only do enough of it to ship features. So maybe these are not "mistakes" but rather tactical choices made by politically smart engineers focused on shipping features.

368 comments