Avoid rewriting a legacy system from scratch by strangling it

[+] d_watt|6 years ago|reply

I had the experience of inheriting a codebase that was halfway through the process of being “strangled”, and it was a nightmare. The biggest reason being that it's not a "fail safe" way to plan a project. In this particular case, a full replacement was probably a 12 month affair, but due to poor execution and business needs, priorities shifted 6 months in. It was full of compromises. In some places, instead replacing an API completely, it would call into the old system, and the decorate the response with some extra info. Auth had to be duplicated in both layers. Debugging was awful.

While some of the issues could be chalked up to "not doing it right," at the core of it, the process of strangulation described in the article leaves the overall architecture in a much more confusing state for the lifetime of the project, and if you have to shift, you've created vastly more tech debt then you had with the original service, as you now have a distributed systems problem. Unless you can execute on it quickly, I think it's a very dangerous way to fix tech debt, avoiding fixing the core issues, and instead planning for a happy path where you can just replace everything.

If you absolutely think you need to quarantine the existing code, I'd recommend putting a dedicated proxy in place that routes either to the old service or the new service, and not mixing the proxy and the new code. That separation of concerns makes it much easier to debug, and vastly reduces the likelihood of creating a system of distributed spaghetti. What I’d really recommend, though, is understanding the core codebase that powers the business, and make iterative improvements there, rather than throwing it all out.

[+] beaker52|6 years ago|reply

This is the case when you get involved in any mid-refactor system or codebase. It has to evolve to it's destination and it always happens over time bit-by-bit. Quitting in the middle of a murder isn't likely to give you the trunk full of insurance money, nor the pinacoladas on the beach living happily ever after with the victim. These projects, once started are best seen through, else you end up in a worse position than you started.

[+] ericmcer|6 years ago|reply

I am in a similar position. We started the strangling process 2 years ago but due to management not wanting to disrupt old clients by shifting them to the new code (same stuff updated UI) the strangling strategy has basically shifted into us maintaining two copies of many feature. Success!

[+] james_s_tayler|6 years ago|reply

So if strangling is out and big-bang rewrites are out...

What do we do?

[+] unknown|6 years ago|reply

[deleted]

[+] ineedasername|6 years ago|reply

My recent experience with an ERP, specifically some major bolt-on modules, was that the vendor simply made the switch to a new platform that had maybe 60% of the capabities. A roadmap (which has actually been fairly accurate) showed about 3 years to get to 90%.

New customers were pushed to the new product. Existing one were encourage to do so and temporarily live without prior features (usually with temp workers doing things manually) for a deep discount. Those who had to stay with the legacy system were told to expect nothing but bug fixes and compliance-related updates (for federal programs and reporting requirements) and that if they needed something more than that, they'd either need to built their own bolt on (there was a robust, if clunky sdk) or pay contractors to do so.

It sucked, yeah, but it seemed like a reasonable way to go about such a transition that was always going to make people unhappy.

[+] iamaelephant|6 years ago|reply

This is more or less the model that Basecamp uses with their rewrites. New product with new features and a strong encouragement to come along, but guaranteed support if you can't.

[+] davedx|6 years ago|reply

This is the way you do it. Nobody is happy but the product and business live on.

[+] wiradikusuma|6 years ago|reply

I'm in the middle of a rewrite. It's very challenging, but the alternative is worse (a sinking ship). My lessons learned:

  1. Do it sooner
  2. Get full commitment from stakeholders
  3. Agree on feature freeze
  4. Get it done quickly
  5. Don't over promise, esp about the timeline
  6. Focus on delivering big/important items first (MVP)
  7. Appoint a benevolent dictator, don't assemble a committee to avoid second-system  syndrome
  8. Have test scenarios ready (black box)

Unfortunately they all depend on another, e.g. the longer you wait for rewrite, the harder it will be to finish it (feature creep).

I will write a blog post when it's done successfully, otherwise I will hide under rock.

[+] layer8|6 years ago|reply

Of course, that approach is difficult to apply if the interface is a significant part of, or deeply entangled with, the pain points that the rewrite is intended to solve.

[+] marczellm|6 years ago|reply

It is also difficult if there's an ill-defined interface that exposes implementation details, or no interface at all.

It is also difficult to apply if we are not talking a server/client app but a desktop app, being rewritten in a different language or incompatible GUI toolkit.

[+] BurningFrog|6 years ago|reply

The "Strangler" needs to copy the legacy interface at first, but after that you can add new interface feature to it.

[+] msclrhd|6 years ago|reply

You can temporarily implement both interfaces where needed, create adaptors, or other patterns during the intermediate steps.

[+] hinkley|6 years ago|reply

Like giant session state stored in the server using session affinity...

[+] aargh_aargh|6 years ago|reply

The comma, present in the article and missing in HN, completely changes the meaning of the title.

[+] kspacewalk2|6 years ago|reply

What is the alternative meaning? To be, the comma changes nothing. It's unambiguous either way.

[+] xxs|6 years ago|reply

Indeed, the title here made little sense - I was reading the article and it sort of contradicted what I got from HN...

Thanks for pointing the titles actually differ.

[+] versteegen|6 years ago|reply

This is a nice example where adding a comma not just changes the meaning, but inverts it.

[+] jayd16|6 years ago|reply

This blog by Mr It, Strangling.

[+] tapland|6 years ago|reply

I really don't get this. How does it change?

[+] rusticpenn|6 years ago|reply

We did it very differently in our group 1. The developers of the old tool continued to work on it. 2. A new team took requirements from the old team and filtered to make them more meaningful 3. Designed a system architechture that would work with the targeted workflow 4. Designed a minimal version and ran it with a new branding next to the old one. 5. Reached feature parity with the old one and dumped it

The important thing to note is that the new tool does not do everything the old tool does. The workflow is also different from the old one. However the customers loved the new one as it was simpler, faster and more robust to use.

[+] noobermin|6 years ago|reply

Am I the only one in the IT adjacent world who thinks the inverse of this is the larger problem (churn, NIH, reinventing the wheel) in software today?

[+] dano|6 years ago|reply

No, you are not the only one. The quest for the new shiny thing is stronger than ever today. New frameworks, new languages, silver bullets everywhere. Good decision making frameworks are in tremendous need in the technology world to help everyone understand the ramifications of the choices they're trying to make.

[+] on_and_off|6 years ago|reply

oh you are not the only one at all !

I mostly work in the android world and the chase for the new and shiny is real.

I see some new libraries get a lot of traction seemingly only because they are written in kotlin/coroutines, not because they offer a better solution (for the one I have in mind, they did not even bother trying to do a benchmark to compare it with the existing solutions).

The thing is, the Android dev ecosystem got WAY better in some aspects.

Having moved to some of the new and shiny, well implemented MVI/MVVM architectures backed by Rx or Flow are very robust and give a good framework to develop on.

You still have to fight back against the zealots yelling that solution x that works just fine should be replaced by solution y even though it would take months of engineering work and does not really improve anything you care about (e.g. a 5% diminution of bytecode size is not something that's worth spending weeks on, or a network stack that hand wavingly 'improves performances' with no benchmark made to actually ascertain where our hot paths are)

PS.: for the parts that got worse : the build system and Android Studio failed to scale quickly enough to keep up with the enormous increase of build complexity. As a result they are slowly become less and less useable for large projects.

[+] mikekchar|6 years ago|reply

I specialise in legacy code. Not that I'm opposed to doing a greenfield project now and again, but I genuinely enjoy working with legacy code. It's a fun challenge and the "stink of old" on it keeps the must-have-shiny people away for the most part.

However, it's always a challenge. For example, sometimes you have subsystems that are begging to be retired. For example, on one system we're maintaining about 20 KLOC of GWT code. For the last 10 years or so, it hasn't really been worth moving away from it, but there will be a day (that is rapidly approaching, I think) where the cost of supporting a mostly abandoned Java framework that compiles into JS outweighs the risk and cost of slowly replacing it.

There's a real difference between being pissed off with the choices your predecessors made, or lusting after the new, hot framework and saying: nope... this just isn't viable any more. Planning that transition isn't easy either. Again, it's one of the reasons I enjoy this kind of work.

And sometimes, you even just decide that you're going to work with what you've got. Ironically, though, this usually involves more churn, NIH and reinventing the wheel because code written 20 or 30 years ago did not have the facilities that we desire in modern development. You think, I'd love to enjoy the benefits of that new framework, but there ain't no way that we'll be able to use it. How do I get the benefits using the code I already have? Answer: you study what other people are doing and you build the same damn thing in your environment. Nobody builds the new-shiny for old stuff so if you want it, you have to build it yourself.

I enjoy bonsai trees. As trees grow, the branches become out of scale with the trunks. You can imagine that if your trunk is the size of a pencil, it doesn't take long for the branches to catch up. So if you want a tree that is in scale, you are constantly having to prune off the branches and grow new ones. There is a saying that a bonsai tree is never finished until it is done. Code is the same way. There is no such thing as avoiding churn -- unless you are truly trying to kill off your project. You always have to prune off branches and grow new ones, otherwise development will slowly grind to a halt -- the challenge of adding functionality without changing the code base getting to be more and more complex. But if you prune your branches before they grow you will end up with a stick in a pot. Or if you decide that you want to grow out every bud that pops out, then you will have an impenetrable mass of confusion. Deciding which branches to grow and which branches to prune, unfortunately requires good taste.

[+] scarface74|6 years ago|reply

I’ve had to learn not only reinventing a wheel created by someone else before I came to a company, but also reinventing the wheel that I created when I didn’t know what I know now.

[+] 32gbsd|6 years ago|reply

You dont rewrite modern software.

[+] mannykannot|6 years ago|reply

One thing that complicates matters somewhat (as if they were not already complicated) is at the decision point marked isRoundtrip? in the fourth (penultimate) diagram, where the affirmative case is handled within the new system.

Given, however, what is being posited -- a legacy system that is not modular and which contains unrefactorable pathological dependencies -- the old system must also handle this case in parallel, in order to be in the correct state to handle future requests of a type that still need to be delegated to the old system.

This parallel implementation may have to persist well into the replacement process, and the requirement for it to do so may mean that you still have to do double implementation of features and fixes for most of the transition.

[+] SirSavary|6 years ago|reply

Requiring the legacy system to handle the request in parallel is exactly what this method is trying to avoid.

If your old system has dependencies that you don't understand, I don't see the strangulation method working at all.

[+] kazinator|6 years ago|reply

Fantasy:

> Here’s the plan:

> Have the new code acts as a proxy for the old code. Users use the new system, but it just redirects to the old one.

> Re-implement each behavior to the new codebase, with no change from the end-user perspective. Progressively fade away the old code by making users consume the new behavior. Delete the old, unused code.

Here is the reality:

1. People do the above incompletely; their deletion of the old system slows down and then they move on to another project or organization, leaving a situation in which 7% of the old system still remains.

2. People iterate on the entire above process, ending up with multiple generations of systems, which still have bits of all their predecessors in them.

[+] unknown|6 years ago|reply

[deleted]

[+] khendron|6 years ago|reply

I think an overlooked aspect of a legacy system that makes "strangling" difficult is that nobody fully understands the behaviours of the system anymore.

It is really hard to replace the functionality of a piece of code when you don't know 100% what that functionality is.

[+] ahuth|6 years ago|reply

This is a good point.

I'm working on moving some functionality out of a system - not replacing the system. And it's still extremely challenging to actually figure out everything that's going on with just the thing I'm moving out.

[+] myth_drannon|6 years ago|reply

I see it working for a backend code, legacy UI systems has way more coupling so it would be better to do a complete rewrite. If you have a legacy framework A and you start replacing it with framework B, component by component, it will have to follow the practices of framework A and basically you are going to be writing legacy style code in the new framework B which is much worse than having legacy framework A. Because framework B is now written in a completely alien way and not how it was intended to be used.

[+] pflanze|6 years ago|reply

I have written a set of libraries and dev tools (like a better repl) for Perl (the FunctionalPerl project) with the idea to help write better code in that language, and to give me and whoever joins in such efforts a way to hopefully save a legacy code base. Maybe it is the case that when a company reaches the point where they feel their code base has become unmaintainable, it can still be saved by using the tools and programming approaches that I can provide. That (other than, and more than just, "because I can") is the major motivation why I invested into that project. But I wonder how much it will help. I haven't had the chance to try it out so far. I got to know companies that have begun to split up Perl applications into micro services and then move the individual services to other languages, and they don't necessarily have an interest in my approach. But I'm also very diffident reaching out to more companies, due to worrying about how much pain it would be to deal with (and how likely it would fail)--investing my time into newer tech (Haskell, Rust etc.) instead looks tempting in comparison. Should I continue to reach out to companies to find the right case (presumably working as a contractor, with some big bonus if successful)? Any insights?

[+] Cthulhu_|6 years ago|reply

I'm dealing with a rewrite at the moment (that is, I was hired to start rewriting an existing web application). I want to apply this pattern but the existing codebase was already dated by the time it was written. It's a huge load of mixed responsibilities, globals (it's a PHP backend), RPC-like http API (every request is a post containing an entity name, action, parameter, and additional parameters handled in a big switch), etc. Files of 13K lines of code.

So far I'm stuck in the overthinking phase of the new application. And as the article states, I'm asked to keep adding new features to the existing application - nothing big (because individual things aren't big), but at the same time, I've been adding a REST API on top of the existing codebase for the past few weeks. It's satisfying in a way but it hurts every time I have to interact with the existing codebase and figure out what it's doing.

Plus we're not going to get rid of the existing application at this rate. I should probably set myself limits - that is, I'll postpone and refuse work on the existing application if it's not super critical. And quit if they're not committed to the rewrite before the summer.

[+] jillesvangurp|6 years ago|reply

Strangling is a good way to slowly replace a system by simply starting to work around it until whatever value it adds is so diminished you can safely pull the plug.

Big software rewrites are extremely risky because they take inevitably more time than people are able to estimate and also the outcome is not always guaranteed.

An evolutionary approach is better because it allows you to focus on more realistic short term goals and it allows you to adapt based on priorities. Strangling is essentially evolutionary and much less risky. It boils down to basically deciding to work around rather than patch up software and minimize further investment in the old software.

Also, there are some good software patterns out there for doing it responsibly (e.g. introducing proxies and then gradually replacing the proxy with an alternate solution).

[+] ncmncm|6 years ago|reply

I did a rewrite.

The old code worked, but was slow. Adding features would make it slower. Lock-free queues and threads everywhere, packet buffers bouncing from input queues to delivery queues to free queues to free lists, threads manfully shuttling them around, with a bit of actual work done at one stage.

Replaced it all with one big-ass ring buffer and one writer process per NIC interface. Readers in separate processes map and watch the ring buffer, and can be killed and started anytime. Packets are all processed in place, not copied, not freed, just overwritten in due time.

It took a few months. Now a single 2U server and a disk array captures all New York and Chicago market activity (commodity futures excepted).

I kept the part that did the little work, scrapped the rest.

C++, mmap, hugepages FTW.

[+] corneliusphi2|6 years ago|reply

Having successfully replaced a legacy system one time we got it to work by turning the legacy system's business logic into a library that the new system could use. This key is just replace the underlying architecture without reimplementing years of work.

[+] d--b|6 years ago|reply

What the article describes is a rewrite! In the end there will be no more legacy code left...

What the article is saying is: don’t rewrite your code in one go, but rather cut the system in pieces that are independent and rewrite each in successive phases.

It’s kind of obvious, though. And the difficult part of the rewrite is actually to slice the original code in indépendant chunks. More often than not legacy systems are riddle with leaky abstractions and dependencies (the infamous spaghetti code), that’s a hell to disentangle.

[+] sivanmz|6 years ago|reply

Often, the clients of legacy code are old too, and are hard coded to access it.

I've done this, but on a private branch, with a single merge to trunk in the end. Starting with complex integration tests, new interfaces were gradually defined and made the code testable, giving me the needed confidence.

[+] shujito|6 years ago|reply

So, how can this be applied to mobile app development? I can think of adding dependencies and new code to get along with the old code in the app, but it will cause a considerable bloat (size) of the app, which it can be noticeable by management, unlike web services/sites/apps

[+] inanutshellus|6 years ago|reply

Not to mention legacy thick apps! In my case, legacy thick-apps we don't have the source code for! Arg!

121 comments