Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?
557 points| whattodochange | 3 years ago
- this code generates more than 20 million dollars a year of revenue
- it runs on PHP
- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )
- it doesn't use composer or any dependency management. It's all require_once.
- it doesn't use any framework
- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )
- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.
- the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
- JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.
- no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.
- In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...
- no caching ( but there is memcached but only used for sessions ...)
- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.
I know a full rewrite is necessary, but how to balance it?
[+] [-] codingdave|3 years ago|reply
But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.
Once you are at that point, start picking off pieces to modernize and improve.
Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change... come in embracing that this beast of a codebase makes 20 million a year. So talk about how the team can improve it, and modernize their skills at the same time.
Because if you walk in, saying, "This all sucks, and so do you, lets throw it out", do you really have to wonder why you are hitting resistance?
[+] [-] skrebbel|3 years ago|reply
As the team’s manager, it’s your job to get buy-in from the executives to gradually fix the mess. You don’t need to tell the team exactly how to fix it, but you gotta get buy-in for space to fix it.
One approach is just to say “every Friday goes to adding tests!” (And then when there’s some reasonable test coverage, make fridays go to refactoring that are easy with the new tests, and so on).
But this often fails because when Friday comes, something is on fire and management asks to please quickly squeeze this one thing in first.
The only other approach I know of is to get buy in for shipping every change slightly slower, and making the code touched by that change better. Eg they want to add feature X, ok add a test for adjacent existing functionality Y, then maybe make Y a little better, just so adding X will be easier, then build X, also with tests. Enthusiastically celebrate that not only X got shipped but Y also got made better.
If the team is change averse, it’s because they’re risk averse. Likely with good reason, ask for anecdotes to figure out where it comes from. They need to see that risk can be reduced and that execs can be reasonable.
You need the buy-in, both from the execs and the team. Things will go slightly slower in the beginning and it’s worth it. Only you can make sell this. The metaphor of “paying off technical debt” is useful here since interest is sky high and you want to bring it under control.
[+] [-] samuellevy|3 years ago|reply
There's so much low-hanging fruit there that's so easy to fix _right now_. No version control? Good news! `git init` is free! PHPCS/PHP-CS-fixer can normalise a lot, and is generally pretty safe (especially when you have git now). Yeah, it's overwhelming, but OP said that the software is already making millions - you don't wanna fuck with that.
I've done it, I've written about it, I've given conference talks about it. The real bonus for OP is that the team is small, so there's only a few people to fight over it. It's pretty easy to show how things will be better, but remember that the team are going to resist deleting code not because that they're unaware that it's bad, but because they are afraid to jeporadise whatever stability that they've found.
[+] [-] latchkey|3 years ago|reply
That's simply not true. I've inherited something just as bad as this. We did a full rewrite and it was quite successful and the company went on to triple the revenue.
> get some testing in place
Writing tests for something that is already not functional, will be a waste of time. How do you fix the things that the test prove are broken? It is better to spend the time figuring out what all the features are, document them and then rewrite, with tests.
[+] [-] defrost|3 years ago|reply
It's also a juggling job from hell so keep a cool head and seek support and resources for what needs to be done.
A big first step is to duplicate and isolate, as much as possible, a "working copy" of the production working code.
You now need to maintain the production version, as requests go on, while also carving out feasible chunks to "replace" with better modules.
Obviously you work against the copy, test, test again, and then slide in a replacement to the live production monolith .. with bated breath and a "in case of fubar" plan in the wings.
If it's any consolation, and no, no it isn't, this scenario is suprisingly common in thriving businesses.
[+] [-] scarab92|3 years ago|reply
Management need to know that this needs a rewrite, and a more capable team, and that persuing on aggressive roadmap while things are this bad is impossible.
If they say no, and you try to muddle your way through it anyway, you are setting yourself up to fail.
If they say yes, ask for the extra resources necessary to incrementally rewrite. I would bring in new resources to do this with modern approaches and leave the existing team to support the shrinking legacy codebase.
[+] [-] innocentoldguy|3 years ago|reply
I would normally opt for your suggested approach too. However, based on the description given, I’d most likely recommend a complete rewrite in this case. The architecture appears to be quite poor and the risk of infecting new code with previous bad decision-making may be too great.
[+] [-] forty|3 years ago|reply
Do things progressively. Read the code, figure out the dependencies, find the leaves and starts with refactoring that. Do add tests before changing anything to make sure you known if you change some existing behaiors.
Figuring out such code base as a whole might be overwhelming, but remember that it probably looks much more complicated than it is actually.
[+] [-] belfalas|3 years ago|reply
For an additional perspective see this classic: https://dhemery.com/articles/resistance_as_a_resource/
[+] [-] jollofricepeas|3 years ago|reply
This is a clear case where he needs to look for another job IMMEDIATELY.
Here’s why…
1. The problems listed are too technical and almost impossible to communicate to a non-technical audience meaning business or c-suite.
2. The fixes will not result (any time soon) in a change that’s meaningful to business like increased revenue or speed to market. Business will not reward you if you are successful or provide resources to get the job done unless the value is apparent to them (See #1).
Employment is a game. Winning that game means knowing when to get out and when to stay.
It’s time to plan your exit both for your own sanity and the good of your family.
[+] [-] rk06|3 years ago|reply
A new person who complains about existing code and proposes "Rewrite everything" on week one, will not met with __respect__
[+] [-] u89012|3 years ago|reply
[+] [-] newswasboring|3 years ago|reply
[+] [-] drderidder|3 years ago|reply
[+] [-] hpcjoe|3 years ago|reply
I'd argue that the first order of business is getting the code committed to SCM. Then you can coach the team on new branches (features/bugs), and build the culture of using the SCM. Do this before going to the execs and giving the 10,000 meter view.
Go to the execs and get buy in on the scope of what you need. I'd recomment articulating it in terms of risk reduction. You have a $20M revenue stream, and little control/testing over the machinery that generates this. You'll work on implementing a plan to get this under control (have an outline of this ready, and note that you need to assess more to fill in the details). You need space/time/resources to get this done.
Then get the testing in place. Make this part of the culture of SCM use. Reward the team for developing sanity/functionality tests. Get CI/CD going (a simple one, that just works). From this you can articulate (with the team's input), coding/testing standards to be adhered to.
After all this, start identifying the problematic low hanging fruit. Work each problem (have a clear problem statement, a limited scope for the problem, and a desired solution). You are not there to boil the ocean (rewrite the entire thing). You are there to make their engineering processes better, and move them to a more productive environment. Any low hanging fruit will have a specific need/risk attached to it. Like "we drop tables/columns regularly using user input." Based upon the culture you created with SCM/testing, you can have the team develop the tests for expected and various corner cases. From that, you can replace the low hanging fruit.
Keep doing that until the fruit is no longer low hanging. Prove to the execs that you can manage/solve problems. Once you have that done, you can make a longer term roadmap appeal, that is, start looking at what version 2.0 (or whatever number) would look like, and what internal bits you need to change to get there.
Basically, evolution, not revolution. Easier for execs under pressure to deliver results to swallow. Explain in terms of risks/costs and benefits, though in the near term, most of the focus sounds like it should be on risk reduction.
[+] [-] ardit33|3 years ago|reply
The original code was just not salvageable. (It was quickly done as a fast hack, and it would break left and right, causing outages).
Just make sure the OP needs to understand what the OG system is trying to do, and what it will take to re-write it to something sane. Don't start it, before understanding all the caveats of the system/project you are trying to re-write.
[+] [-] smcleod|3 years ago|reply
Map out the functionality related to the (hard) requirements and kick off replacing the product(s) with something modern and boring.
[+] [-] benevol|3 years ago|reply
Yes, 3 people creating a revenue of $20 million/year is impressive.
But what if 1, let alone 2 of them quit and/or fall ill? That's way too much risk for this type of revenue.
If a new team member needs a year to just understand how the code is organized, then a well structured and documented rewrite certainly is necessary.
[+] [-] batch12|3 years ago|reply
[+] [-] ericmcer|3 years ago|reply
I would try to identify how entangled some of the dependencies are and start my rewrite with the goal of getting rid of them. But yeah I agree that version control and testing is going to be key here as you any backsliding will probably result in the idea of future refactoring being viewed negatively.
[+] [-] garren|3 years ago|reply
Regarding the team, junior they may be, as he says, but they’re rolling with a multi-mullion dollar product. If they’re keeping the product going and continuing to add business value, then they’re doing something right. Their engineering practices might be questionable, but they seem to have a solid product.
However, getting testing in place is going to be a challenge. I’ve encountered systems that sound similar to this one (perfectly functional, zero discernible architecture, not remotely designed with any kind of testing in mind.) It’ll be difficult to convince the suits that introducing testing has any real value when you’re starting from zero.
The first thing than comes to mind is the strangler fig pattern. Sounds like a useful idea in this instance.
> …an alternative [to a re-write] is to gradually create a new system around the edges of the old, letting it grow slowly over several years until the old system is strangled.[0]
[0] https://martinfowler.com/bliki/StranglerFigApplication.html
[+] [-] smrtinsert|3 years ago|reply
Start with tests can't emphasize this enough.
[+] [-] fny|3 years ago|reply
From a business perspective, nothing is broken. In fact, they laid a golden goose.
> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.
My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.
> productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.
But you just told me they built a $20M revenue product with 3 bozos. That sounds unbelievably productive.
> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers
You should consider quitting your job.
As far as the business is concerned, there are no problems... because well... they have a money printer, and your team seems not to care enough to advocate for change. Business people don't give a damn about code quality. They give a damn about value. If 2003 style PHP code does that, so be it. Forget a rewrite, why waste time and effort doing simple refactoring? To them, even that has negative financial value.
From their perspective, you're not being paid to make code easy to work with, you're being paid to ship product in a rats nest. Maybe you could make a business case for why its valuable to use source control, dependency management, a framework, routing outside of nginx, and so on... but it doesn't sound like any of that mattered on the road to $20M a year, so it will be very difficult to convince them otherwise especially if your teammates resist.
This, again, is why you should consider leaving.
Some developers don't mind spaghetti, cowboy coding. You do. Don't subject yourself to a work environment and work style that's incompatible with you, especially when your teammates don't care either. I guarantee you will hate your job.
[+] [-] seasoup|3 years ago|reply
First, get everything in source control!
Next, make it possible to spin service up locally, pointing at production DB.
Then, get the db running locally.
Then get another server and get cd to that server, including creating the db, schema, and sample data.
Then add tests, run on pr, then code review, then auto deploy to new server.
This should stop the bleeding… no more index-new_2021-test-john_v2.php
Add tests and start deleting code.
Spin up a production server, load balance to it. When confident it works, blow away the old one and redeploy to it. Use the new server for blue/green deployments.
Write more tests for pages, clean up more code.
Pick a framework and use it for new pages, rewrite old pages only when major functionality changes. Don’t worry about multiple jquery versions on a page, lack of mvc, lack of framework, unless overhauling that page.
[+] [-] larsnystrom|3 years ago|reply
Second: Doing a full rewrite with a junior team is not going to end well. They’ll just make other mistakes in the rewritten app, and then you’ll be back where your started.
You need to gradually introduce better engineering practices, while at the same time keeping the project up and running (i.e. meeting business needs). I’d start with introducing revision control (git), then some static testing (phpstan, eslint), then some CI to run the test automatically, then unit/integration tests (phpunit), etc. These things should be introduced one at a time and over a timespan of months probably.
I’d also have a sort of long term technical vision to strive against, like “we are going to move away from our home-written framework towards Laravel”, or “we are moving towards building the client with React Native”, or whatever you think is a good end outcome.
You also need to shield the team from upper management and let them just focus on the engineering stuff. This means you need to understand the business side, and advocate for your team and product in the rest of the organization.
You have a lot of work ahead of you. Be communicative and strive towards letting people and business grow. I can see you focus a lot on the technical aspects. Try to not let that consume too much of your attention, but try to shift towards business and people instead.
[+] [-] unity1001|3 years ago|reply
I would think about that very long. Over the years, experience has shown me that regardless of what framework or library you use, you introduce a lot of dependencies to your app when you build on such a framework or library. The common sense says that they should make things easier, and at the start they definitely do that. But over the years, you start encountering backwards-incompatible changes, major moves etc in those frameworks and libraries which start taking your time. And sometimes a considerable chunk.
I would only use a framework or library that has a major backwards compatibility policy or viewpoint. JSON's 'only add, never deprecate' is a very good ideal to strive to. Even if this couldn't be entirely feasible in software, it should be at least strived to.
So I'd say if something that is built in-house works, easy to use and keep maintained, there is absolutely no reason to move to an external framework.
[+] [-] neverartful|3 years ago|reply
[+] [-] yieldcrv|3 years ago|reply
from a user perspective seeing a php file extension is an accurate predictor with seeing a disorganized mess of everything and a “LAMP stack” stuck in 2003 just as described here
from a developer perspective it’s correlated with everything described by OP
you’re correct it isn’t inherently php’s problem, it can do RESTful APIs and a coherent code design pattern no problem
[+] [-] cool-RR|3 years ago|reply
The problem with this plan is corporate politics. Say that OP takes on this challenge. He makes a plan and carefully and patiently executes it. Say that in six months he's already fixed 30% of the problem, and by doing do he meaningfully improved the team's productivity.
The executives are happy. The distaster was averted, and now they can ask for more features and get them more quickly, which they do.
Congratulations, OP. You are now the team lead of a mediocre software project. You want to continue fixing the code beyond the 30%? Management will be happy for you to take it as a personal project. After all, you probably don't have anything to do on the weekend anyway.
You could stand strong and refuse to improve the infrastructure until the company explicitly prioritizes it. But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.
[+] [-] rockwotj|3 years ago|reply
Those seem like low hanging fruit that are unlikely to effect prod.
You should also probably spend a decent amount of time convincing management of the situation. If they're oblivious that's never going to go well.
I agree a full rewrite is a mistake and you have to instead fixed bite sized chunks. It also will help to do that if you start to invest in tooling, a deploy story and eventually tests (I'm assuming there are none). If I was making 20 million off some code I'd sure as heck prioritize testing stuff (at least laying the groundwork).
Its probably also worth determining how risk tolerant the product is and you could probably move faster cleaning up if it is something that can accept risk. If it's super critical and I'd seriously prioritize setting up regression testing in some form first
[+] [-] simonw|3 years ago|reply
1. Commit the entire production codebase to git and push it to a host (GitHub would be easiest here)
2. Set up a cron that runs once every ten minutes and commits ALL changes (with a dummy commit message) and pushes the result
Now you have a repo that's capturing changes. If someone messes up you have a chance to recover. You can also keep track of what changes are being applied using the commit log.
You can put this in place without anyone having to change their current processes.
Obviously you should aim to get them to use git properly, with proper commit messages - and eventually with production deploys happening from your git repository rather then people editing files in production!
But you can get a lot of value straight away from using this trick.
It's basically a form of git scraping: https://simonwillison.net/2020/Oct/9/git-scraping/
[+] [-] beachy|3 years ago|reply
But I would start by choosing how and whether to fix up the crown jewels, the database.
You say that instead of adding columns, team has been adding new tables instead. With such behaviours, it's possible your database is such a steaming pile of crap that you'll be unable to move at any pace at all until you fix the database. Certainly if management want e.g. reporting tools added, you'd be much better to fix the database first. On the other hand, if the new functionality doesn't require significant database interaction (maybe you're just tarting up the front end and adding some eye candy) then maybe you can leave it be. Unlikely I would imagine.
Do not however just leave the database as a steaming pile of crap, and at the same time start writing a whole lot of new code against it. Every shitty database design decision made over the previous years will echo down and make it's ugly way into your new nice code. You will be better for the long run to normalise and rationalise the DB first.
[+] [-] brigandish|3 years ago|reply
[+] [-] bawolff|3 years ago|reply
Some these things are terrible choices but some of these are just weird choices that aren't neccesarily terrible or a minor inconvinence at most.
E.g. no source control - obviously that is terrible. But its also trivial to rectify. You could have fixed that in less time it took to write this post.
Otoh "it runs on php" - i know php aint cool anymore, but sheesh not being cool has no bearing on how maintainable something is.
> "it doesn't use composer or any dependency management. It's all require_once."
A weird choice, and one that certainly a bit messy, but hardly the end of the world in and of itself.
>it doesn't use any framework
So?
What really matters is if its a mess of spaghetti code. You can do that with or without a framework.
> no caching ( but there is memcached but only used for sessions ...)
Is performance unacceptable? If no, then then sounds like the right choice (premature optimization)...
> the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.
Not ideal... but also pretty minor.
Anyways, my point is that what you're describing is definitely unideal, but on the scale of legacy nightmeres seems not that bad.
[+] [-] karmicthreat|3 years ago|reply
If you stay you need to manage your relationship with the management team. This involves the usual reporting, lunches etc. You need to setup some sort of metrics immediately. Just quarterly might be sufficient. Nobody is going to care about bug fix counts, your metrics should be around features.
Testing and version control are a good place to start. But you are going to need to get them started there and you will pretty much need to instill good discipline. You will be herding cats for quite a while. If you can't get these two items going well in 3 months then abort and leave. You don't want to stick around for when the money printer stops working and nobody can figure out why.
[+] [-] mrits|3 years ago|reply
[+] [-] neverartful|3 years ago|reply
[+] [-] pabe|3 years ago|reply
We also introduced git as well as dev and staging tiers and some agile methodologies. Definitely do some that first!
Now, as management and customers are happy, the backend can be refactored step by step. Here, more test coverage might come in handy.
So, I'd recommend to be a bit picky about where to create value. You can restructure the whole database and that'll be good for maintenance (and most likely performance) but management & customers won't literally "see" much. Ask the people with the money for their preferences, excite them to get more runway. Regarding "backend stuff": Think like a Microservice architect and identify components that are least strongly coupled and have a big (performance) impact. Work on those when management is happy and you've got plenty of budget.
Your job is to create value and reduce risk. Not to create something that's technically awesome ;)
[+] [-] philanon267|3 years ago|reply
[+] [-] stackbutterflow|3 years ago|reply
It's likely there's a lot of history and political shenanigans that OP isn't aware of yet. This could be a sinking ship. If it's a profitable business why is the team made of juniors?
A small company with legacy code that is a huge mess but that is maintained by the same person for the last 10 years is one thing. The same mess in the hands of 3 juniors who don't even use version control means no one with experience has lasted long enough at this company. That's a red flag.
[+] [-] rudi_mk|3 years ago|reply
[+] [-] caprock|3 years ago|reply
2. Slowly start extracting code and making small functions. Document like crazy in the code as you learn. Keep the single file or close to it, and don't worry about frameworks yet.
3. Introduce unit tests with each new function if you can.
After all that is done, make a plan for next steps (framework, practices, replace tech etc).
Along the way, take the jr backend engineer under your wing, explain everything, and ensure they are a strong ally.
Call me crazy, but that project sounds like fun.
[+] [-] academia_hack|3 years ago|reply
We did a complete rewrite into a Django application, it took 2 years and untold political pain but was absolutely the correct choice. The legacy code was beyond saving and everyone on the team agreed with this assessment - meaning our political battles were only outward facing.
In order to get support, we started very small with it as a "20% project" for some of our engineers. After level setting auth, cicd, and infrastructure stuff, we began with one commonly used functionality and redirected the legacy php page to the new python-based page. Every sprint, in addition to all the firefighting we were doing, we'd make another stealth replacement of a legacy feature with its updated alternative.
Eventually we had enough evidence that the replacements were good (users impressed with responsiveness, upgraded UI stuff like replacing default buttons with bootstrap, etc.) that we got a blessing to make this a larger project. As the project succeeded piecemeal, we built more momentum and more wins until we had decent senior leadership backing.
Advocating for this change was basically the full time job of our non-technical team members for 2 straight years. We had good engineers quit, got into deeply frustrating fights with basically every department in the company and had rough go of it. In the end though, it did work out very well. Huge reduction in cost and complexity, ability to support really impactful stuff for the business with agility, and a ton of fulfilling dev experience for our engineers too.
All this is to say, I understand where everyone warning you not to do a rewrite is coming from. It's a deeply painful experience and not one to be embraced lightly. Your immediate leadership needs to genuinely believe in the effort and be willing to expend significant political capital on it. Your team also needs to be 100% on board.
If you can't make this happen and you're not working on a business which does immense social good and needs your support as a matter of charity, you should quit and go somewhere more comfortable.
[+] [-] catears|3 years ago|reply
My impression from others in this thread is that they mean "start from scratch and build until features are on-par with current product" when they say full rewrite.
Your version of full rewrite seems like it is generally applicable, but I have very little faith in the latter approach.
[+] [-] rocqua|3 years ago|reply
[+] [-] francasso|3 years ago|reply
1) A rewrite from scratch is almost always a bad idea, especially if the business side is doing just fine. By the way, when you want to sell a rewrite, you don't sell a rewrite, you sell an investment in a new product (with a new team) and a migration path; it's a different mindset, and you have to show business value in the new product (still ends up failing most of the time, but it has a better chance of getting approved).
2) You never ever try to change people (or yourself) directly. It's doomed to failure. You change the environment, then the environment changes the people (if the changes are slow and inertia is working for you, otherwise people just leave).
Since probably it would be too hard to change the environment by yourself and given that your team seems fine with the status quo, my advice it to just manage things as they are while you look for another job. Otherwise my bet is that your life will be miserable.
[+] [-] ruskyhacker|3 years ago|reply
This isn't going to come off nicely, but your assumption that it needs a full rewrite, is in my eyes a bigger problem than the current mess itself.
The "very junior" devs who are "resistant" to change are potentially like that in your view for a reason. Because of the cluster they deal with I suspect the resistance is more they spend most of their time doing it XYZ way because that's the way they know how to get it done without it taking even more time.
What it sounds like to me is that this business could utilize someone at the table who can can understand the past, current, and future business - and can tie those requirements in with the current environment with perhaps "modernizing" mixed in there.
[+] [-] corytheboyd|3 years ago|reply
So uh, good luck. You're going to be the one everyone hates.
I'd just quit in your shoes, to be completely honest. Your desire for a solid foundation will never be seen as anything but a roadblock to an organization that just wants more floors added to the house with reckless abandon for safety.
Any securities gained by improvements you champion will go unnoticed. You will be blamed when the inevitable downtime from molding a mountain of shit into less of a mountain of shit happens.
You are going to lose this fight. Please just quit and go work for a software engineering organization, you seem to have taken a job at a sausage factory for some reason. I'd also try to learn from that...
Good luck.
[+] [-] BIKESHOPagency|3 years ago|reply
[+] [-] Scarblac|3 years ago|reply
In my view, as long as management believes this, a fix is not possible at all.
You should forget about improving the code but see your job as a kind of consultancy thing where you teach management about what they have and the consequences of that are.
And probably look for a new job. If you are completely successful with teaching management, it may be working on this, but it'd probably need to be renegotiated as if it were a new job