top | item 4557919

Ask HN: I just inherited 700K+ lines of bad PHP. Advice?

52 points| ohmygord | 13 years ago | reply

So, I've just inherited a very large, very badly written monstrosity. Including javascript, template files etc, it breaks the 1 million LOC barrier. I'm looking for some advice and strategies that you guys might have used in similar situations, in particular on:

- getting a handle on the code base - communicating 'progress' to the client - not losing the will to live

The software is based on vtiger, an open-source CRM that has a (deserved) reputation of being incredibly badly written, that has since been badly hacked apart by several different companies with wildly differing ideas. My client currently have 150+ installs and 150+ angry clients.

Words fail me trying to describe the state of the software.

- no niceties such as MVC, ORMs, a DBAL, or a modular design - all DB queries are inline SQL, with tens of inner joins on most queries - dizzying call stack, yet reams of copy+paste code

The best part: the code will often query the DB and execute PHP code contained in the response, or load and run arbitrary files and modules as dictated by parsing particular DB fields. The one page I have studied in detail generates 105 DB queries in the simple case.

The DB itself is even worse. There are over 600 tables, as well as views, custom functions, cascades and (but of course) triggers. There is no consistent naming schema, very few explicit foreign key references (despite being heavily, heavily entwined) and I have already discovered several tables that don’t have primary keys, but are referenced by exact string matches on things like date stamps.

I wont mention the table-based HTML, javascript, lack of version control etc.

I’m not sure if its even possible to give relevant advice (besides perhaps ‘run screaming’), but if anyone here has come through a similar situation and has any advice to share, I would be deeply grateful.

Help me HN - you're my only hope. (PS. 2K char limit sux)

75 comments

order
[+] lhnz|13 years ago|reply
1. Get it onto version control.

2. Make sure there is some workable strategy for deploying and testing the code.

3. Ask somebody to provide you with a list of the changes, or else try to create some kind of diff against the original version of the code. If you can see crazy stuff here then find out who did it...

4. Ask somebody what the biggest bugs are? Which things are causing clients the most problems?

5. Try to establish which convention is 'winning' in the codebase. But you might want to create a more sensible convention which will allow unit testing (start this immediately!)

6. At this point, ask if you can hire people to work on this with you as it's a big problem, and you need to free yourself up for the rewrite.

7. If that isn't possible then leave. You have done enough to make your CV better and a company which passes you something like this does not care about your career.

[+] mtrimpe|13 years ago|reply
First read the Fowler's 'Refactoring' book; it was written just for you. Then:

1. Identify a small and easily separable piece of code (what you woud call a component in a normal system.)

2. Write tests covering every (important?) edge-case of the piece of code you want to rewrite.

3. Mercilessly refactor until it's nice and squeeky clean.

4. Lather, rinse and repeat.

And of course, make sure your client acknowledges that it's a giant clusterf... and is on board with you pulling the system out of the stone age.

Also, if you want to make life a bit more interesting for yourself, get the PHP code's AST and programmatically rewrite existing code to shared conventions for kicks.

[+] ck2|13 years ago|reply
I started writing some suggestions but you know what - someone is dumping this on you because they didn't care and their predecessors didn't care, etc. They probably make far more than you for doing far less.

The moment you start touching the code, you are going to start being blamed for the nightmare preceding you. It could even affect your career if future employment researches where you worked previously and gets told you made the mess in the first place.

[+] pasbesoin|13 years ago|reply
My thoughts are in line with this. Are you (honestly) being hired or promoted to fix this mess, or to keep things running?

If the former, has the incredible scale and scope of this been properly identified, addressed, and acknowledged? Are you guaranteed anything near the resources to (try to) accomplish this (including your own time, without traipsing far into overtime)? Is the current state documented sufficiently to obviate any and all future attempts to blame you?

If the latter (more likely, I suspect), well... I guess the simplest question is, do you have an agenda and an exit strategy that leaves your career intact? (And your health...)

Maybe, given the particulars, this is a real opportunity for you. But that's not spelled out at all, nor obviously implied, in your post. And given that this situation was allowed to develop to this extent in the first place, and that you have angry clients to deal with, right off the bat, it doesn't sound promising.

Do you like playing the role of unacknowledged hero who falls on his sword and is cursed by his clueless fellows, while some other protagonist goes on to get the girl?

There is a lot of downside, here. What's the upside? Do the organization's goals and commitments match your personal ones?

[+] strictfp|13 years ago|reply
Yes, and the chance that the OP will get the codebase in order sounds minimal. It's simply too large, it would take one person many years to clean up a mess like that.
[+] fallous|13 years ago|reply
I've faced similar Augean stables in the past (and present, unfortunately).

I'd suggest that correcting the DB is one of the last things you can do, especially if queries are scattered throughout the code instead of in functions. You could attempt to abstract it with an internal API and as you update the codebase replace with calls to the API. Once fully abstracted, you can then focus on getting the DB corrected and only need to modify the new API functionality.

As to the code itself, sit down and map out all the verbs and nouns in the system. If you have a Contact noun, what is the definition of that role and what verbs can be applied to it or what verbs could it do. This gives you a good map for creating functions that can then be used to replace existing inline stuff.

Triage the worst bugs or performance bottlenecks and see if they are particular to a noun and/or verb, which should give you an obvious starting place to begin refactoring. For emergency hotfixes and such, feel free to just tweak the existing crap code but otherwise try and work on your functional units to get ahead of the game.

And always remember, pimpin' ain't easy. ;)

[+] bobfromhuddle|13 years ago|reply
+1 for this. The first job is to stabilise the system by fixing critical bugs.

As you're doing so, move all those queries into one big fat DB class, and when you spot groups of related queries, split them out into their own classes.

The next priority should be to get rid of the PHP from the DB - if need be create another huge class with a zillion if-else statements.

You need to modify the code to simplify it. You don't need to improve the design, you just need to dumb it down until you can understand where all the parts are. Stabilise, simplify, then refactor.

[+] sp332|13 years ago|reply
But first... version control!
[+] citricsquid|13 years ago|reply
What is most important is what are you trying to achieve. Are you trying to make the system as stable as possible at the lowest cost to the client or are you trying to bring this system into a future proof state and the client is willing to pay for that?

Personally if I were in your position I would explain to the client that I'm happy to temporarily fix some bugs but long term the system needs to be rewritten. 700k lines of code is a lot, but the way you've described it I get the feeling most of that code is needless. Depending on what the system actually does you could conceivably rebuild in a few months.

[+] mtrimpe|13 years ago|reply
I've spent quite some time the cross-component spaghetti code large companies sometimes write.

I've come to believe that the skill of not rewriting from scratch but forcing yourself to slowly refactor (as per Martin Fowler's definition) existing systems into a proper state is one of the most important skills you can develop.

That way, once you've refactored most of the system (which includes adding tests for all the important functionality) you can indeed confidently rewrite everything. If you do it any sooner than that though, you're in for a world of pain.

[+] burningion|13 years ago|reply
I was in this exact same situation two years ago, only with an eCommerce platform which shall remain anonymous. The client had gone through 4 companies, trying to get the project built and finished. Nobody had been able to wrangle it clean.

Reluctantly, I took on the project, and started working through it. What I initially estimated would take me a month to untangle ended up taking a year. That's an entire year in the snake pit. And since it was eCommerce, there was serious money on the line when it came to bugs. And there were hundreds that I found.

Just understand the commitment you're making. Make sure your client has the money and the time to make things right. Ultimately, in my project, the client insisted on quick hacks to keep competition at bay, and the code dissolved into a mess. I decided I couldn't keep working with a codebase that was never given a chance.

Understand what you're getting yourself into. Because you're taking over the responsibility of loading a massive crap ton of software into your head for diagnosis. How long do you want to have fragile, crappy, lazy code in your head? Forget about bring superman, you're not going to save a million lines of code, you're going to become the builder of the hacks that work around absurdity. Make sure you understand that.

The burden of broken code you're responsible for, that's always broken in production is like nothing I've ever encountered. Make sure it's worth it.

[+] toddmorey|13 years ago|reply
So the cold reality for this client is that the codebase will have to be replaced over time. You are not trying to escape legacy simply for the lure of something new, you are trying to escape insanity.

I would talk to the client about focusing your effort on helping them transition--a small piece at time--to a sane architecture. If they aren't open to that, they aren't your client. I've been a business leader in a position where we had to make really tough and painful decisions about coding projects gone awry. I don't envy their position, but continuing forward with this monster does not seem to be in the long-term interests of the company.

[+] mattmcknight|13 years ago|reply
This is the beauty of the web- the transition to a sane architecture can be done page by page- without any visibility to the user. Every time you have to make a change- fix an old feature or add a new one, you replace it with the new architecture. Even if it's not a whole page- you can load in a partial with javascript. The challenge is holding back from going too deep on the refactoring all at once. Functional tests around the system have to be added to keep your sanity as part of the change process.
[+] wissler|13 years ago|reply
This is the best response. To be a bit more general: evaluate whether the client is actually willing and able to take the sane course of action starting NOW and continuing over the long term. If not, and you stay, there's no advice that's going to be much help. Your life will be hell.
[+] schulz|13 years ago|reply
1. I'm so sorry.

2. Set up a development environment and deploy the code there. Get it working. With code that large (and with the added wrinkle of executing code out of the database) changing things is going to be a nightmare of unintended consequences. Getting a testable environment up will let you find those things and help you understand what it does.

3. Get it in version control. This should be number 1: Before you make changes get a baseline of where it was.

4. Find a bug that exemplifies the nastiness of the whole situation and make a fuss. Let everybody know why this bug is so bad and what caused it. This will give your employer a concrete example to look at when you say "this code is shite". Harp on this bug.

5. Fix that one bug. Roll it out. Be a hero.

At this point you'll have a good base line, some credibility, and the organization will understand what a mess they've got. Now you'll have to figure out what you want to accomplish: keep it limping along? Improve it? Rewrite? The above steps will get your feet under you.

[+] beck5|13 years ago|reply
I would start by spending a week or two wrapping the application in Acceptance tests (cucumber/capybara) just so when you do make a change you are able to quickly find out to a semi decent level of confidence things are ok.

I would also recommend Working Effectively with Legacy Code By Robert C. Martin.

Good luck!

[+] eigenrick|13 years ago|reply
Can't agree more with this. The way you reduce risk when migrating code (especially code you don't fully understand) is to build strong integration and end to end tests.

This will allow you to be more aggressive when replacing crap code with new functionality.

Given this huge mound of code, this process will be quite a drag, but it will pay off in spades in the long run. Trust us.

[+] wpietri|13 years ago|reply
My tips:

Raise your rates. If you just took the gig, you'll have to wait a bit. But you should price your work such that whether they say yes or no, you have no regrets.

Expose the problem. Start inventorying the issues. Track them in the same way that you track other work. As you do things that the client has requested, track real vs actual time. E.g., "This change took 12 hours; if the code base were clean, it would have taken 1."

Estimate the size of the problem. Talk in terms of technical debt. E.g., "Module X needs 120 hours of work to bring the code to commonly accepted standards of code quality." The clients are thinking, "We have a system we paid $1m for, so it's an asset worth $1m." Expose the debt and they will have a better idea of the true value of the code base.

Look for opportunities to declare tactical bankruptcy. Once you have numbers, you can show that some portions of the code base will be cheaper to rewrite than to clean up. Help your clients make good financial decisions about when to just toss and rewrite particular parts of the code.

Don't let them make you crazy. I'd recommend something like a kanban board to track work and strictly limit work in process. This system is probably a mess because the client is insane. Develop some very clear, very firm boundaries that keep them from driving you crazy as well. If you are lucky, they will, over time, learn from you to behave rationally about software.

[+] pfisch|13 years ago|reply
You should quit. Life is short, there is no reason for you to spend it doing that.
[+] jsmartonly|13 years ago|reply
* Rewrite.

* NEVER modify existing one. Once you change one line of comment, you own all the code and problem from that point.

* If rewrite is not allowed, then ask huge pay raise for this work. Basically it is not about money, it is about bring everyone on the same page on the status of he existing solution.

* If the above does not work out, prepare to switch to another project, or quit the job totally.

[+] wanderr|13 years ago|reply
This might sound sacreligious to the many vim fans here, but get a good IDE, it will help you get a handle on what the code is doing and let you navigate around faster, which is especially handy if the execution path for accomplishing any one thing involves dozens of files. A good IDE will also point out blatant errors, and a really good IDE will point out potential errors as well. I personally really like PHPStorm by JetBrains, the code inspection tool is quite good. I was recently able to cut the size of our code base in half by using it to identify tens of thousands of bugs, a lot of them on inspection were "this never worked" type bugs, which with a little digging I was able to confirm could never be called. Eliminating code also makes refactoring the remaining code easier because you have fewer interdependencies to worry about.
[+] scotty79|13 years ago|reply
Don't touch anything. Run. There's no glory for you in this.
[+] stilkov|13 years ago|reply
First of all, make sure the client and you agree on what the long-term strategy is. Then get buy-in for a first step in this direction.

If the system is as bad as you make it sound, the long-term goal has got to be a complete replacement of the existing codebase. That will usually require as much effort, and thus money, as writing the existing version did. (Experience shows there's is no reason to confidently assume different.)

Then, explain how to get there without doing a (hopeless) complete rewrite in one big bang:

First, you need to make a set of decisions for the new code you're going to write, i.e. the language, framework(s), architecture, whatever you want to new code to be based on.

Next, you try to modularize the existing system so that you can replace one tiny part.

That's going to be really, really hard - modularizing a systems after it's in production always is. Don't do it all at once: If you've isolated some small piece of so that there's a clear interface (based on a programming language API, a database interface or (my favorite) some RESTful HTTP API), rewrite that small piece using your new technology stack and integrate it with the existing monstrosity.

Once you have done that successfully for some small aspect, you have some sort of proof that this approach can work.

Then, over the next months (or more likely: years), rinse and repeat.

This is a hugely expensive thing to do, but that shouldn't come as a surprise – after all, you're replacing the organs in a living body while it's running a marathon on its last breath. An MBA should understand that the additional cost is because this strategy drastically reduces the risk.

You can explain to the customer that they can try this out using just a small part, and decide whether or not they want to continue afterwards. Point out that you're going to start with those parts that produce most of the existing pain. Explain that you're helping them to return to a situation in which they have fewer bugs, can introduce new features quickly and easily, and best of all, that the end result will be a system that's modularized, hopefully ensuring that they won't run into the same situation again.

If they expect you to do magic, i.e. maintain the mess and magically turn it into a good piece of software without being allowed to actually change it significantly, get out of the contract as quickly as you can.

[+] beering|13 years ago|reply
You should make sure to set expectations. If your employer only wants the minimal amount of maintenance done, then don't do any more. You could go to heroic lengths to repair the codebase, but if that's not what they're asking for it will be in vain.

Second, I suggest applying as many tools as you can. A modern version control system, of course, and keeping any version control history that you inherited (although it sounds unlikely).

A powerful IDE might also let you start cutting out crap immediately, so try PHPStorm or Eclipse+PHP (or both!) and see what they can tell you.

And start writing tests as you start making changes, because you'll likely break something seemingly unrelated when you start changing things.

[+] RivieraKid|13 years ago|reply
700k+ codebase and a single developer? That sounds crazy.

Run away from this. Trying work with this code would make you stressed and frustrated, which will have a significant negative impact on your productivity.

If the company plans to add features to this software, they should hire more than one developers and perhaps rewrite it from scratch.

Edit: Also send your boss link to this discussion :)

[+] pmtarantino|13 years ago|reply
A few weeks ago I was commited with something like that. I just quit the job, I couldn't sleep at night and I was not making any progress in the first days. I know there is a learning curve, but it had been two weeks and I couldn't do anything. Be sure you can work with that before accepting, because then is really hard.
[+] kellishaver|13 years ago|reply
I had a similar experience a few months ago, myself, with a huge, very poorly written PHP app. I lasted three weeks; three weeks in which I didn't sleep and felt constantly agitated while I spent nearly every waking moment at the computer trying to be productive amidst an ocean of stress. My whole family suffered from this project, because I became very difficult to live with.
[+] ScottBurson|13 years ago|reply
Lots of good advice in here. I lean toward the "run screaming" side myself.

The fact that they've burned through several contracting companies and still think it's possible to get large numbers of bug fixes in the first month suggests that they're pretty clueless. It's going to take you two or three months (if not longer) just to get your head into the code enough that you can fix anything nontrivial.

If I were advising them, I would say they need two teams: one team of just two or three people (or maybe just one) solely trying to fix bugs in the existing code, and the other team of three or four people doing a complete rewrite from scratch. As others have commented, complete rewrites are normally a bad idea, but this code base is so far gone I don't think it can be incrementally refactored into sanity. Oh, and they should expect the rewrite to take two years.

But despite their experience to this point, it sounds like they're still not ready to hear that. Which leaves you little choice but to run screaming.

EDITED to add: what these people need to understand is that their demand for results in a hurry is what got them into this mess in the first place. Until they get that I don't think there's any hope.

[+] grey-area|13 years ago|reply
The first thing I would do, before doing any work, would be to sit down with the client/your boss and explain just how bad the situation is, that drastic solutions are in order, and that it will take years to get this under control (with 1m LOC and 150 clients presumably all running customised software, this would take years to sort out even with a large team working on it). Unless they understand that from the beginning you will never get the backing you need to sort this out.

This might be one of the few occasions where a complete rewrite is justified (if you can keep scope limited to reproducing what you have). You've said the code is incredibly complex, and if the problem domain is incredibly complex too, you're probably stuck refactoring. If the problem domain is pretty simple (a CRM without too many extra features might be), you may be better starting with your smallest client who uses the product the least, asking for all the pain points, and things they love, about the current software, and writing a simple CRM to cover their needs which replicates the features of the current product, then gradually porting other clients over to the new system and adding new features to it, while keeping the old code-base in maintenance mode and fixing serious bugs only. If you do a rewrite you'd have to port the 150 clients over 1 by 1, and leave the other code in maintenance mode - your primary client may not be at all happy with that.

If that's not possible, you'll have to refactor it slowly while keeping the code in place, so the first step is to get it into version control, sort out a sane deployment strategy with testing servers, then try improving some small isolated areas of the code for one of the clients in isolation. Good luck!

[+] thaumaturgy|13 years ago|reply
I have actually worked on a code base like you're describing, on a contract basis, for a client. I loathe maintenance programming, so the relationship didn't last very long -- just a few months. So:

1. Make sure you have a rock-solid contract in place with the client that will ensure that you get paid, get paid well, and get paid often. Receiving a check in the mail makes it easier to look at the code. If your payment terms are anything like, "payment-upon-completion of ...", or, "paid net 30 after invoice", or anything like that, you simply won't want to work on the code.

2. If "soul-crushing", "depressing", or "makes me want to hang myself" are phrases you'd use to describe the code or your state of mind when looking at it, then go into this project knowing that you're not going to last long. There are people who genuinely enjoy working on stuff like this. You aren't one of them.

3. Everybody that says "rewrite" is dreaming. It is impossible to rewrite something that large without breaking something and spending too much money. Re-factoring a function is doable. Re-factoring a thousand-line file is doable. Re-factoring part of a database is doable. Re-factoring all of it all at once is starry-eyed fiction. Not gonna happen.

4. But, if taking ownership of this code base is something you want to do, then add re-factoring time in to your agreement with the client -- something like, "20% time spent replacing bad code" -- and focus on the tiniest little ugly thing you can find, and re-factor that. Start on it, don't stop until it's done. Keep it in small bite-sized chunks.

5. Make sure you're getting paid for time spent just getting familiar with the code base. If you work with it long enough you'll actually get pretty familiar with most of it, but you want to do that on their dime, not yours.

6. Get help. They surely realize by now that they've got a mess on their hands. Talk with them about whether or not you can bring on additional help. If they flat-out refuse, run. (That is what killed my work with my client; I wanted to move into a position where I managed a junior programmer and focused on code rewrites and higher-level stuff; they refused, I quit. They wanted an employee, not a contractor.)

7. Version control and a sane bug tracking system (Mantis isn't horrible) are must-haves. If they don't have these, again, make sure they pay for it.

Dealing with a code base like this one is as much about state-of-mind as anything else. Either you can handle it or you can't. No amount of advice here will make it more palatable to you if you're not the sort of person that's OK with inheriting a disaster.

Also, even if you've got some kind of agreement in place with the client already, it sounds like you've just now gotten your first look at the code. This, in my opinion, makes it totally OK to go back to the client and re-negotiate. You can open it with, "I'd like to work with you, but now that I've seen the project that you want me to work on, I can understand why this has been a problem for you, and I need to make sure that we can come to an agreement that will work for both of us so that I can fix this for you." (Or something.)

[+] robomartin|13 years ago|reply
" Everybody that says "rewrite" is dreaming. It is impossible to rewrite something that large without breaking something and spending too much money. Re-factoring a function is doable. Re-factoring a thousand-line file is doable. Re-factoring part of a database is doable. Re-factoring all of it all at once is starry-eyed fiction. Not gonna happen."

I don't think so. Yes, the opportunities to do something like this are rare. It is up to the project lead and the client to decide whether or not this makes sense.

Also, keep in mind that a "complete re-write" doesn't necessarily literally mean that every line of code must be re-written. There's often tons that can be salvaged.

If the code base is an absolute disaster I would not touch it without the understanding that the project might entail massive re-writing of portions of the codes base as well as significant structural modifications. Maybe I'm lucky in that I've never really had to go look for work. I would flat-out reject a project like this without massive client buy-in.

If it is mission critical for the client and they can afford it there is no reason not to step back, truly evaluate the situation and consider a significant redo of the app.

For any non-trivial enterprise having a solid and maintainable code base is nearly priceless. Is it worth investing a year and the corresponding financial commitment to fix the problem once and for all? For the right business, yes! The alternative is to live with a patch-work of code for the next ten years of more.

Because I move across disciplines I have seen this sort of thing in many areas outside of just software code-bases.

I have, as an example, seen data processing facilities with millions of dollars in equipment designed in patch-work fashion that bleed money on a daily basis. In one such cases I proposed a complete redoing of the facility (in staged fashion in order to not affect business). It was very costly, but the owners where under such pain due to the constant bleed that they saw the intelligence in investing a lot of money to lay down an infrastructure that would withstand the test of time, not to mention stopping the bleeding.

Similarly, I have seen this in faulty processes. Process optimization or redesign can be critical to a business. The most well known example of this is the automobile industry.

Car manufactures like Mercedes were devoting fully 20% of their factory floor space to repairs. Cars would come off the assembly line with defects that would have to be repaired after the fact. This consumed a tremendous amount of time, money and resources.

In sharp contrast to this, companies like Toyota where using an approach that aimed to have cars come off the line with zero defects. They'd stop the assembly line when an defect was detected. At first they nearly couldn't make cars. The philosophy was to ensure that detected defects never re-occurred. With time cars started to come off the line with few, if any, defects. Most car manufacturer have now adopted these ideas.

The point is that sometimes a "complete rewrite" is warranted and even necessary. On cannot categorically state that the idea of a re-write is "fiction" any more than stating that it is an absolute necessity while being completely detached from the players and their circumstances. I suggest that it is for the client and consultant to evaluate and decide.

On a personal note. I don't enjoy working with crap. I enjoy my craft. Whether it is writing code, designing electronics or mechanical. I enjoy doing good work and working in quality projects. Life is too short to work on shit projects. You learn nothing and nobody is happy.