My favorite tool for trying scary complicated things in an unknown space is the feature flag. This works even if you have zero tests and no documentation. The only thing you need is the live production system and a way to toggle the flag at runtime.
If you can ship your hypothesis along with an effectively unaltered version of prod, the ability to test things without breaking other things becomes much more feasible. I've never been in a real business scenario where I wasn't able to negotiate a brief experimental window during live business hours for at least one client.
While very powerful, I think it's worth calling out some pitfuls. A few things we've ran into
- long lived feature flags that are never cleaned up (which usually cause zombie or partially dead code)
- rollout drift where different environments or customers have different flags set and it's difficult to know who actually has the feature
- not flagging all connected functionality (i.e. one API is missing the flag that should have had it)
Feature flags are like bloom filters. They make 98 out of 100 situations better and they make the other 2 worse. When performance is the issue that’s usually fine. When reliability is the issue, that’s not sufficient.
If you work on fifty feature toggles a year, one of them is going to go wrong. If your team is doing a few hundred, you’re gonna have oopsies.
Most of the problematic cases are where the code is set up so that the old path and the new one can’t bypass each other cleanly. They get tangled up and maybe the toggle gets implemented inverted where it’s difficult to remove the old path without breaking the new.
You can go even further with something like the gem scientist at the application level, or tee-testing at the data store level. Compare A and A', record the result, and return A. Eventually, you reach 100% compatibility between the two (or only deviations that are desirable) and can remove A, leaving only A'
I also like recording and replaying production traffic, as well, so that you can do your tee-testing in an environment that doesn't affect latency for production, but that's not quite the same thing.
You’ve just resolved a problem I had. I had this problem on a search engine, but I made it as a “v2”. And I told customers to switch to v2. And you know the v2 problem: Discrepancies that customers like. So both versions have fans, but we really need to pull the plug on v1. You’ve just solved it: I should have indexed even records with v1, odd records with v2. Then only I would know which engine was used.
Write tests. Most likely those 300k lines of code contain a TESST folder with 4 unit tests written by an intern who retired to become a bonsai farmer in the 1990s, and none of them pass anymore. Things become much less stressful if you have something basic telling you you're still good.
The problem with complex legacy codebases is that you don’t know about the myriads of edge cases the existing code is covering, and that will only be discovered in production on customer premises wreaking havoc two months after you shipped the seemingly regression-free refactor.
I've been working on react and react native applications professionally for over ten years, and I have never worked on a project with any kind of meaningful test coverage
This is a good method if you are stuck and you don't know what you need to do. It also helps explore a project with a specific task in mind.
It is not very useful in giving you confidence your changes would not cause unexpected side effects, which is usually the main problem working with legacy code.
If you want confidence when working with legacy code, your best bet is to do a strangler fig pattern - find a boundaries for the module you want to work on, rewrite the module (or clone and make your changes), run both at the same time in shadow mode, monitor and verify your new module is working the same as the old one, then switch and eventually delete the old module.
Also known as "Make the change easy, then make the change"
Something to realize is that every codebase is legacy. My best new feature implementations are always several commits that do no-op refactorings, with no changes to tests even with good coverage (or adding tests before the refactoring for better coverage), then one short and sweet commit with just the behavior change.
I also do this and try to teach it to others. One thing I add is trying to go even further and making it so the new feature can essentially be a configuration change (because you built the system already in the first steps). It doesn't fit every situation so it's by no means a hard rule but "prefer declaration functionality over imperative".
The more experienced I get the more I see how these simplified techniques that might be useful to a journeyman engineer in the right context can fail horribly in the wrong context.
For this particular example, the first question I have is why are we upgrading the ORM? As a codebase grows and matures, the cost of ORM change increases, and so too must the justification for upgrading it increase. Any engineer worth their salt needs to know this justification and have it mind at all times so they can apply appropriate judgment as the discovered scope increases. Let's assume now the change is justified.
The next question critical question is how do you know if you've broken anything? Right in the intro the author talks about an "untested and poorly documented codebase", but then in the example uses basic compilation as a proxy for success. I'm sorry to be harsh, but this completely hand-waves away the hard part of the problem. To have any confidence in a change like this you need to have a sense of what could go wrong and guard against those things, some of which could be subtle data corruption that could be extremely costly and hard to unwind later. This may involve logging, side-by-side testing, canary deployments, additional monitoring, static analysis and/or any other number of techniques applied based on an understanding of what the upgrade actually means under-the-hood for the ORM in question combined with an analysis of what risks that entails to the system and business/process in question. Drawing a mindmap of your refactoring plan is barely more than IntelliJ (let alone Claude Code) can already do at the click of a button.
Of course, working in a legacy codebase is also torture.
Software development is a hyper-rational endeavor, so we don't often talk about feelings. This article also does not talk much about feelings.
Reading between the lines, it looks like reverting the code is supposed to affect how you feel about the work. Knowing that failure is an explicit option can help to set an expectation; however, without a mature understanding of failure, that expectation may just be misery.
With a mature understanding of failure, the possibility of a forced rollback should help you "let go" of those changes. It's like starting a day of painting or drawing with one that you force yourself to throw away; or a writing session with a silly page.
----
If someone thinks that they are giving you good advice, but it sounds terrible, then maybe they are expecting you to do some more work to realize the value of that advice.
If you are giving someone advice and they push back, maybe you are implying some extra work or expectations that you have not actually said out loud.
Maybe it is the framing of the step as a "reversion" or "roll-back" rather than "spike" or "prototype" that is causing that sense. Personally, I would never throw away the code I spent time and effort writing just to stick to a systematized refactoring method like this "Mikado." I don't think the advice is unsound, and I have done exactly this many times in my own career, but instead of throwing it away I would shelve it, put it in a branch, write a document about what has been/needs to be done, and write a corresponding tech debt or feature/fix ticket for it with the new and realistic estimate.
While great in theory, I think it almost always fails on "non-existent" testing structures that reliably cover the areas you're modifying. I change something, and if there's no immediate build or compile error, this (depending on the system) usually does not mean you're safe. A lot of issues happen on the interfaces (data in/out of the system) and certain advanced states and context. I wouldn't know how Mikado helps here.
In other words, I'd reword this to using the Mikado method to understand large codebases, or get a first glimpse of how things are connected and wired up. But to say it allows for _safe_ changes is stretching it a bit much.
Yes, most of the time such spaghetti code projects don't have any tests either. You may have to take the time to develop them, working at a high level first and then developing more specific tests. Hopefully you can use some coverage tools to determine how much of the code you are exercising. Again this isn't always feasible. Once you have a decent set of tests that pass on the original code base, you can start making changes.
Working with old code is tough, no real magic to work around that.
Is it possible in practice to control the side effects of making changes in a huge legacy code base?
Maybe the software crashes when you write 42 in some field and you're able to tell it's due to a missing division-by-zero check deep down in the code base. Your gut tells you you should add the check but who knows if something relies on this bug somehow, plus you've never heard of anyone having issues with values other than 42.
At this point you decide to hard code the behavior you want for the value 42 specifically. It's nasty and it only makes the code base more complex, but at least you're not breaking anything.
Anyone has experience of this mindset of embracing the mess?
All. The. Time. And I hate it. Imagine giving a customer a rebate based on buggy code. You fix a bug, the customer comes back and wants to check that the rebate was correct that last time. Now you have to somehow hard-code the rebate they did get so that your (slightly less buggy) code gives the same result. But hard-coding has the risk of introducing other errors on its own. Oh yes, and you've never enough time to do things properly because Customers (or maybe Management). A tangled mess of soul destroying lifeblood-sucking code and pressures ensues.
I've never seen code truly get that bad, but I can already think of several problems with that approach.
Do you really know all of the expected behavior you're hardcoding in? What happens if your hardcoded behavior is just incorrect enough that it breaks something somewhere else? How can you be sure that your test for that specific value is even correct?
I think the better approach is to let things break naturally and open a bug with your findings. You'd be surprised how often someone else knows exactly what's going on and can fix it correctly. Your hacks are not just pouring gasoline onto the fire, but opening a well directly underneath that will keep it burning for a long time.
I've been a few times in a situation where I needed to make significant changes in a huge codebase with lot's of tests but also with a lot of corner cases, on my own.
I've spent blood sweat, tears and restless evenings scrolling and ctrl-f-ing huge build and test logs to finally accomplish the task.
But let's take a step back.
So they assign you to get that done. You're supposed to be careful, courageous and precise while making those changes without regression. There's very little up-to-date documentation on the design, architecture, let alone any rationale on design choices. You're supposed to come up with methods like Mikado, tdd, shadowing or anything that gets the job done.
Is this even fair to ask? Suppose you ask a contractor to re-factor a house with old style plumbing and electricity. Will they do it Mikado style, or, would they say - look - we're going to tear things down and rebuild it from the ground. You need to be willing to pay for a designer, an architect, new materials and a set of specialized contractors.
So why do we as sw engineers put up with the assignment? Are we rewarded so much more than the project manager of that house who subcontracts the work to many people to tear down and rebuild?
To extend your analogy: if the house is a listed building (UK concept; apparently US equivalent is listed in National Register of Historic Places), by law you cannot just tear it down. You need to do much more work to renovate what can be done without disturbing the original structure. This obviously costs much more and is generally done by different specialists, who have harder job and hence are better paid. So the question comes back to: what kind of work do you want to do...
If you're paid by the hour, then does it really matter if you have to refactor stuff? If it takes a long time to do then it'll be more expensive for your employer.
Does the project manager get paid more by the hour to refactor a house than to build one?
Using a Mikado style graph for planning any large work in general has been really useful to me. Used it a lot at both Telia back in 2019 and Mentimeter at 2022.
It gives a great way to visualise the work needed to achieve a goal, without ever mentioning time.
For me now days is like this:
- try to locate the relevant files
- now build a prompt, explain the use case or the refactor purpose. Explain the relevant files and mention them and describe the interaction and how you understand that work together. Also explain how you think it needs to be refactored. Give the model the instruction to analyze the code and propose different solution for a complete refactor. Tell it to not implement it, just plan.
Then you’ll get several paths of action.
Chose one and tell the model to write into a file you’ll keep around while the implantation is on going so you won’t pollute the context and can start over each chunk of work in a clean prompt.
Name the file refactor-<name >-plan.md tell it to write the plan step by step and dump a todo list having into account dependencies for tracking progress.
Review the plans, make fixes if needed. You need to have some sort of table reassembling a todo so it can track and make progress along.
Open a new prompt tell it analyze the plan file, to go to the todo list section and proceed with the next task. Verify it done, and update the plan.
I’ve been using a form of the Mikado Method based on a specific ordering of git commits (by message prefix) along with some pre commit hook scripts, governed by a document: https://docs.eblu.me/how-to/agent-change-process
I have this configured to feed in to an agent for large changes. It’s been working pretty well, still not perfect though… the tricky part is that it is very tempting (and maybe even sometimes correct) to not fully reset between mikado “iterations”, but then you wind up with a messy state transfer. The advantage so far has been that it’s easy to make progress while ditching a session context “poisoned” by some failure.
The things that always get me with tasks like this is that there are *always* clear, existing errors in the legacy code. And you know if you fix those, all hell will break loose!
I d like to hear more about people who have jumped onto large codebases and were instantly productive. I see a lot of emphasis on documentation and comments, but in my experience they get stale real fast.
I think there are similar methods, such as nested todo-lists. But DAGs are exceptionally good for this use case of visualising work (Mikado graphs are DAGs).
Ah, no: incremental approaches only work in already well-formed code.
Poor code requires not coding but analysis and decisions, partitioning code and clients. So:
1. Stop writing code
2. Buy or write tools to analyze the code (modularity) and use-case (clients)
3. Make 3+ rough plans:
(a) leave it alone and manage quality;
(b) identify severable parts to fix and how (same clients);
(3) incrementally migrate (important) clients to something new
The key lesson is that incremental improvements are sinking money (and worse, time) into something that might need to go, without any real context for whether it's worth it.
1. take a well known method for problem solving basically any programmer/human knows
2. slap a cool word from the land of the rising sun
3.???
4. profit!
This article is painfully pretentious and stealth marketing for a book
So you do things one step at a time and timebox as you go? This method probably doesn't need its own name. In fact I think that's just what timeboxing is.
FWIW Mikado seems to be the name of that game where you pick up one stick at a time from a pile, while trying to not disturb the pile. (I forget the exact rules). So it isn’t as if somebody is trying to name this method after themselves or something, it is just an attempt at an evocative made up term. Timeboxing is also, right? I mean, timeboxing is not recognized by my spell checker (I’d agree that it is more intuitive though).
There are important additions beyond timeboxing, at least according to the post. Notably, reverting your changes if you weren't able to complete the chosen task in the time box and starting over on a chosen subset of that task. I can imagine that part has benefits, though I haven't tried it myself.
Why do people promote things that are unnecessarily complicated?
I assume something about this appeals to a certain psychology. Here is the essence of the method for people who dislike rituals: pick one little thing you can succeed at. Do that thing. Repeat as necessary.
It goes a lot further than plan mode though, in fact I would say the key difference of mikado refactors from waterfall refactors is that you don’t do all the planning up front with mikado. If anything you try to do as little planning as possible.
bob1029|7 hours ago
If you can ship your hypothesis along with an effectively unaltered version of prod, the ability to test things without breaking other things becomes much more feasible. I've never been in a real business scenario where I wasn't able to negotiate a brief experimental window during live business hours for at least one client.
nijave|5 hours ago
A good decom/cleanup strategy definitely helps
hinkley|5 hours ago
If you work on fifty feature toggles a year, one of them is going to go wrong. If your team is doing a few hundred, you’re gonna have oopsies.
Most of the problematic cases are where the code is set up so that the old path and the new one can’t bypass each other cleanly. They get tangled up and maybe the toggle gets implemented inverted where it’s difficult to remove the old path without breaking the new.
jaggederest|6 hours ago
I also like recording and replaying production traffic, as well, so that you can do your tee-testing in an environment that doesn't affect latency for production, but that's not quite the same thing.
eastbound|6 hours ago
charles_f|11 hours ago
layer8|9 hours ago
karmakurtisaani|9 hours ago
ipsento606|7 hours ago
Illniyar|9 hours ago
It is not very useful in giving you confidence your changes would not cause unexpected side effects, which is usually the main problem working with legacy code.
If you want confidence when working with legacy code, your best bet is to do a strangler fig pattern - find a boundaries for the module you want to work on, rewrite the module (or clone and make your changes), run both at the same time in shadow mode, monitor and verify your new module is working the same as the old one, then switch and eventually delete the old module.
LoganDark|9 hours ago
nitnelave|6 hours ago
Something to realize is that every codebase is legacy. My best new feature implementations are always several commits that do no-op refactorings, with no changes to tests even with good coverage (or adding tests before the refactoring for better coverage), then one short and sweet commit with just the behavior change.
collingreen|5 hours ago
hinkley|5 hours ago
Mikado is more of a get out of jail card for getting trapped in a “top down refactor” which is an oxymoron.
dasil003|1 hour ago
For this particular example, the first question I have is why are we upgrading the ORM? As a codebase grows and matures, the cost of ORM change increases, and so too must the justification for upgrading it increase. Any engineer worth their salt needs to know this justification and have it mind at all times so they can apply appropriate judgment as the discovered scope increases. Let's assume now the change is justified.
The next question critical question is how do you know if you've broken anything? Right in the intro the author talks about an "untested and poorly documented codebase", but then in the example uses basic compilation as a proxy for success. I'm sorry to be harsh, but this completely hand-waves away the hard part of the problem. To have any confidence in a change like this you need to have a sense of what could go wrong and guard against those things, some of which could be subtle data corruption that could be extremely costly and hard to unwind later. This may involve logging, side-by-side testing, canary deployments, additional monitoring, static analysis and/or any other number of techniques applied based on an understanding of what the upgrade actually means under-the-hood for the ORM in question combined with an analysis of what risks that entails to the system and business/process in question. Drawing a mindmap of your refactoring plan is barely more than IntelliJ (let alone Claude Code) can already do at the click of a button.
yomismoaqui|10 hours ago
csours|6 hours ago
Of course, working in a legacy codebase is also torture.
Software development is a hyper-rational endeavor, so we don't often talk about feelings. This article also does not talk much about feelings.
Reading between the lines, it looks like reverting the code is supposed to affect how you feel about the work. Knowing that failure is an explicit option can help to set an expectation; however, without a mature understanding of failure, that expectation may just be misery.
With a mature understanding of failure, the possibility of a forced rollback should help you "let go" of those changes. It's like starting a day of painting or drawing with one that you force yourself to throw away; or a writing session with a silly page.
----
If someone thinks that they are giving you good advice, but it sounds terrible, then maybe they are expecting you to do some more work to realize the value of that advice.
If you are giving someone advice and they push back, maybe you are implying some extra work or expectations that you have not actually said out loud.
Advice is plagued by the tacit knowledge problem.
castral|5 hours ago
mittermayr|10 hours ago
In other words, I'd reword this to using the Mikado method to understand large codebases, or get a first glimpse of how things are connected and wired up. But to say it allows for _safe_ changes is stretching it a bit much.
SoftTalker|10 hours ago
Working with old code is tough, no real magic to work around that.
agge|10 hours ago
Then by definition you have the smallest safest step you can take. It would be the leaf nodes on your graph?
jeremyscanvic|8 hours ago
Maybe the software crashes when you write 42 in some field and you're able to tell it's due to a missing division-by-zero check deep down in the code base. Your gut tells you you should add the check but who knows if something relies on this bug somehow, plus you've never heard of anyone having issues with values other than 42.
At this point you decide to hard code the behavior you want for the value 42 specifically. It's nasty and it only makes the code base more complex, but at least you're not breaking anything.
Anyone has experience of this mindset of embracing the mess?
0xbadcafebee|7 hours ago
(seriously though, this book has answers for you: Working Effectively with Legacy Code, by Michael Feathers)
niccl|6 hours ago
sublinear|6 hours ago
Do you really know all of the expected behavior you're hardcoding in? What happens if your hardcoded behavior is just incorrect enough that it breaks something somewhere else? How can you be sure that your test for that specific value is even correct?
I think the better approach is to let things break naturally and open a bug with your findings. You'd be surprised how often someone else knows exactly what's going on and can fix it correctly. Your hacks are not just pouring gasoline onto the fire, but opening a well directly underneath that will keep it burning for a long time.
nuancebydefault|5 hours ago
I've spent blood sweat, tears and restless evenings scrolling and ctrl-f-ing huge build and test logs to finally accomplish the task.
But let's take a step back.
So they assign you to get that done. You're supposed to be careful, courageous and precise while making those changes without regression. There's very little up-to-date documentation on the design, architecture, let alone any rationale on design choices. You're supposed to come up with methods like Mikado, tdd, shadowing or anything that gets the job done.
Is this even fair to ask? Suppose you ask a contractor to re-factor a house with old style plumbing and electricity. Will they do it Mikado style, or, would they say - look - we're going to tear things down and rebuild it from the ground. You need to be willing to pay for a designer, an architect, new materials and a set of specialized contractors.
So why do we as sw engineers put up with the assignment? Are we rewarded so much more than the project manager of that house who subcontracts the work to many people to tear down and rebuild?
jz391|16 minutes ago
phito|5 hours ago
Does the project manager get paid more by the hour to refactor a house than to build one?
agge|10 hours ago
It gives a great way to visualise the work needed to achieve a goal, without ever mentioning time.
brutuscat|8 hours ago
Then you’ll get several paths of action.
Chose one and tell the model to write into a file you’ll keep around while the implantation is on going so you won’t pollute the context and can start over each chunk of work in a clean prompt. Name the file refactor-<name >-plan.md tell it to write the plan step by step and dump a todo list having into account dependencies for tracking progress.
Review the plans, make fixes if needed. You need to have some sort of table reassembling a todo so it can track and make progress along.
Open a new prompt tell it analyze the plan file, to go to the todo list section and proceed with the next task. Verify it done, and update the plan.
Repeat until done.
Mikhail_K|9 hours ago
Is that the Mikado method?
eblume|11 hours ago
I have this configured to feed in to an agent for large changes. It’s been working pretty well, still not perfect though… the tricky part is that it is very tempting (and maybe even sometimes correct) to not fully reset between mikado “iterations”, but then you wind up with a messy state transfer. The advantage so far has been that it’s easy to make progress while ditching a session context “poisoned” by some failure.
woodruffw|5 hours ago
[1]: https://en.wikipedia.org/wiki/Short,_sharp_shock
dirkc|8 hours ago
spprashant|5 hours ago
agge|10 hours ago
I think there are similar methods, such as nested todo-lists. But DAGs are exceptionally good for this use case of visualising work (Mikado graphs are DAGs).
w10-1|7 hours ago
Poor code requires not coding but analysis and decisions, partitioning code and clients. So:
1. Stop writing code
2. Buy or write tools to analyze the code (modularity) and use-case (clients)
3. Make 3+ rough plans:
(a) leave it alone and manage quality;
(b) identify severable parts to fix and how (same clients);
(3) incrementally migrate (important) clients to something new
The key lesson is that incremental improvements are sinking money (and worse, time) into something that might need to go, without any real context for whether it's worth it.
theo1996|11 hours ago
agge|10 hours ago
hidelooktropic|11 hours ago
bee_rider|9 hours ago
topaz0|8 hours ago
satisfice|2 hours ago
I assume something about this appeals to a certain psychology. Here is the essence of the method for people who dislike rituals: pick one little thing you can succeed at. Do that thing. Repeat as necessary.
dvh|11 hours ago
agge|10 hours ago
janpot|11 hours ago
eblume|11 hours ago
koakuma-chan|11 hours ago
Using a programming language that has a compiler, lucky.
frpdrp|5 hours ago
[deleted]