I'll highlight something I've learned in both succeeding and failing at this metric: When rewriting something, you should generally strive for a drop-in replacement that does the same thing, in some cases, even matching bug-for-bug, or, as in the article, taking a very close look at the new vs. the old bugs.
It's tempting to throw away the old thing and write a brand new bright shiny thing with a new API and a new data models and generally NEW ALL THE THINGS!, but that is a high-risk approach that is usually without correspondingly high payoffs. The closer you can get to drop-in replacement, the happier you will be. You can then separate the risks of deployment vs. the new shiny features/bug fixes you want to deploy, and since risks tend to multiply rather than add, anything you can do to cut risks into two halves is still almost always a big win even if the "total risk" is still in some sense the same.
Took me a lot of years to learn this. (Currently paying for the fact that I just sorta failed to do a correct drop-in replacement because I was drop-in replacing a system with no test coverage, official semantics, or even necessarily agreement by all consumers what it was and how it works, let alone how it should work.)
This is probably very context dependent, because I've learned the opposite.
For example, I was rewriting/consolidating a corner of the local search logic for Google that was spread throughout multiple servers in the stack. Some of the implementation decisions were clearly made because of the convenience of doing so in a particular server. But when consolidating the code into a single server, the data structures and partial results available were not the same, so re-producing the exact same logic and behavior would have been hard. Realizing which parts of the initial implementation were there because it was convenient, and which were there for product concerns let me implement something much simpler that still satisfied the product demands, even if the output was not bitwise identical.
> ...that is a high-risk approach that is usually without correspondingly high payoffs
That's probably true, but it's also true that over a long enough timescale (100 years, to trigger a reductio ad absurdum) there is a very high risk that not replacing or rewriting that code will sink your technology and possibly your organization.
Just because the risk will be realized in the long run doesn't mean it's not a risk. And if the worst-case scenario is death of the entire organization, then the math could very well add up to a full rewrite. Most business managers are not prepared to think strategically about long-term technical debt. It's the duty of engineers to let them know the difference between "not now" and "never". And the difference between "urgent" and "low priority".
I've learned a variation on this theme, which is more specific:
You have some code that, for whatever reason, you think is not very good. There is a reason why you have ended up with code that is not very good.
If your action is to sit down and write it all again, you should anticipate getting a very similar result (and this has been the outcome of every "big rewrite" effort I've ever seen: they successfully reproduced all the major problems of the old system).
The reasons why this happen are probably something to do with the way you're doing development work, and not the codebase that you're stuck with. Until you learn how to address those problems, you should not anticipate a better outcome. Once you have learned how to address those problems, you are likely to be able to correct the problem without doing a "big rewrite" (most commonly by fixing them one piece at a time).
Sometimes I see people attempt a "big rewrite" after replacing all of the people, thinking that they can do a better job. The outcome of this appears to me to invariably be that the second team who tried to build the system with no real experience end up following a very similar path to the first team that did the same thing (guided by the map that the first team left them, and again reproducing all the same problems).
From these observations I draw one key conclusion: the important thing that you get from taking smaller steps is that you amplify your ability to learn from things that have already been done, and avoid repeating the mistakes that were made the last time. The smaller the step, the easier it becomes to really understand how this went wrong last time and what to do instead. Yes, the old codebase is terrible, but it still contains vitally important knowledge: how to notdothatagain. Neither writing new code from scratch without touching the old, nor petting the old code for years without attempting to fix it, are effective ways to extract that knowledge. The only approach I've ever really seen work is some form of "take it apart, one piece at at time, understand it, and then change it".
I think what works even better is to have "permission" to gradually change both the old and the new. It can drastically simplify the process of creating a replacement, if you're only replacing a slightly more sane version of the original instead of the actual original.
The strategy of proxying real usage to a second code path is incredibly effective. For months before the relaunch of theguardian.com, we ran traffic to the old site against the new stack to understand how it could be expected to perform in the real world. Later of course we moved real users, as incrementally as we possibly could.
The hardest risk to mitigate is that users just won't like your new thing. But taking bugs and performance bottlenecks out of the picture ahead of time certainly ups your chances.
Out of curiosity - when you've done this type of proxy test, what do you do about write operations? Do you proxy to a test DB, or do you have your code neatly factored to avoid writing on the test path (I guess most code I've worked on that needed a rewrite also wasn't neatly factored :) ).
This is tangential, but given the increasing functionality and maturity of libgit2, I wonder if it would yet be feasible to replace the Git command-line program with a new one based on libgit2, and written to be as portable as libgit2. Then there would be just one Git implementation, across the command line, GUIs, and web-based services like GitHub. Also, the new CLI could run natively on Windows, without MSYS.
While I think the libgit2 initiative is fantastic, I don't think there needs to be just one Git implementation.
One of my favourite things about git is that the underlying storage and protocol is really simple and straight-forward to implement. You could do a lot of it in shell scripts, if you wanted to.
The stateless storage is simple and consistent, but the thing that does vary is the various operating algorithms: diff, merge, garbage collection, etc.
This creates a really interesting ecosystem where you could potentially have third-party tools that have some secret sauce producing more efficient diffs or fewer conflicting merges but still base it entirely on the open git ecosystem and remain completely backwards-compatible with all the other tooling.
How does Scientist work with code that produces side effects? In the example, presumably both the new and old each create a merge commit. Maybe these two merge commits are done in in-memory copies of the repo so that the test result can just be discarded, but what about in the general case where a function produces an output file or some other external effect?
I would think that the operation would either (a) have to be pure or (b) be executed in two different environments. I think going for (a) is the easier approach. If you produce an output file, make a pure operation that generates the contents, then write it as a subsequent operation. Now you can test the contents against each other, but only actually write one of them.
Basically, create an intermediary object that represents your state change and test those. Then "commit" the change from control and discard the one from the experiment.
I am trying to understand why the new merge method needed to be tested online via experiment. Both correctness and performance of the new merge method could have been tested offline working with snapshots (backups) of repos. Could a github engineer shed more light here?
Author here. 5 years ago I would have agreed with you and logged e.g. 10 million merge requests to replay them offline. But one thing I've found over the years (which may seem obvious in retrospect) is that staging environment are not identical to production. Particularly not when it comes to finding sneaky bugs and performance regressions -- the code doesn't run on the same exact environment it will run when it is deployed (it has different input behaviors, and most importantly, it has different load and performance characteristics).
The question then becomes "why would you run these experiments offline when you can run them online?". So we simply do. I personally feel it's a game changer.
If you read about what they're doing, they basically are doing that. The tests are run independently of the production code, and production is just providing the test cases.
Speculation, but if they already have the infrastructure to run the test online then it was probably easier than building one-time-use tools to test backups.
Seems like the biggest takeaway is "have good tooling and instrumentation". I'm working with a complicated legacy production system, trying to rebuild pieces of it, and we have little or no instrumentation. Even _introducing_ such tooling is a potentially breaking change to production systems. Ach schade.
Very cool. I like this parallel execution of the original version and the update with comparisons between the two. They use a ruby package developed in house that has been made open source, Scientist. Does anyone know if there is an similar type package for python (preferably 2.7) development? It seems like an interesting area in between unit tests and A/B tests.
> Finally, we removed the old implementation — which frankly is the most gratifying part of this whole process.
On average, I get much more satisfaction from removing code than I do from adding new code. Admittedly, on occasion I'm very satisfied with new code, but on average, it's the removing that wins my heart.
TIL that github used to merge files differently than git because it used its own merge implementation based on git's code, to make it work on bare repos. Showcases a benefit of open formats and open source, showcases a downside as well (I'd never guess it might merge differently.)
It's a good thing nobody contributes to my github repos since noone had the chance to run into the issue...
I wish they would add the ability to fast-forward merge from pull requests. I know many large projects (including Django) accept pull requests but don't merge them on Github simply because of the mess it makes of the history.
In my team's projects, code review in PRs made a mess of the history (certain devs in particular :P). We switched to a squash merge based workflow to address it, git reflow is our particular poison: https://github.com/reenhanced/gitreflow
This is inspiring reading. One may not actually need the ability to deploy 60 times a day in order to refactor and experiment this effectively, but it's clearly a culture that will keep velocity high for the long-term.
In the pursuit to get-things-done, we forget fundamentals and more often than not its the sound fundamentals that come in handy when your product has grown beyond your 1 or 2 member original tech team.
For operations that don't have any side effects, I can definitely see how you could use the Science library.
I'm curious though if there are any strategies folks use for experiments that do have side effects like updating a database or modifying files on disk.
The first thing to do is to try to minimize the scope of mutating operations: e.g, decompose a monolithic read-write operation into independent load, transform, and store. You can then easily test as much as possible (the load and the store) side-by-side. The parts that must have side effects are still hard, but at least there are fewer of them. (One variant of this would be to build your code such that you can always intercept side-effects, and then block the new code's side effects and compare with the old code)
The next thing is to try to take advantage of idempotence (and, by extension, try to make as many of your mutating operations idempotent as possible): there's nothing completely risk free, but you can at least verify idempotence for either ordering of new and old code, and if you're factored right, you can run both paths on the same input and verify they have the same side-effects and output.
Finally, making the observation that the new code must in general be backwards-compatible with the old code, and both versions need to be able to run concurrently (because this situation will always exist during deployment): in the worst case, you can always start with a limited deployment of the new path, which limits the amount of damage done if the new code is bad.
Point the control to the real database, and the experiment to an unused database. After each request through Scientist, the two databases should be identical.
Or, make the output be the SQL command used to mutate the database. If the two outputs are different, then you've found something that needs investigation.
My read of the article implies that they were running the new method on the side, and comparing the results to the old method that was still running in production. They got to 100% before they actually pulled the lever on what customers would use.
Edit: Ah, I see - talking about the Git bugs, not the differences
.
I’m actually not surprised that “256 (or a multiple) merge conflicts” was never noticed (or at least root-caused and fixed) by the entire git community.
Wonderful ability to use a large userbase as a giant fuzzer.
Nothing really to contribute or ask, other than to say that I really enjoyed the writeup. Although I have nothing coming up that would use the code, the new library sounds really neat. Kudos!
Humans will always reverberate around truths like this.
The emphasis shift on breaking vs fixing looks like a good example of how fashion trends in tech create artificial struggles that help new people understand the "boundaries" of $things.
Fashion's like a tool for teaching via discussion
Edit: I'm just commenting on what I percieve as a fashionable title not the article.
Does anyone know what an "O(n) issue" is? I can think of a few possible meanings in the usage here, but I've never heard it before and they all seem wrong.
The word "debt" is not just a financial term. There are debts of gratitude, debts to society, debts of honour, and so there are also technical debts.
Objecting to the name "technical debt" on the basis that it is not the correct financial use of the term is like objecting to the name "work day" on the basis that it isn't measured in joules. It's a category error.
[+] [-] jerf|10 years ago|reply
It's tempting to throw away the old thing and write a brand new bright shiny thing with a new API and a new data models and generally NEW ALL THE THINGS!, but that is a high-risk approach that is usually without correspondingly high payoffs. The closer you can get to drop-in replacement, the happier you will be. You can then separate the risks of deployment vs. the new shiny features/bug fixes you want to deploy, and since risks tend to multiply rather than add, anything you can do to cut risks into two halves is still almost always a big win even if the "total risk" is still in some sense the same.
Took me a lot of years to learn this. (Currently paying for the fact that I just sorta failed to do a correct drop-in replacement because I was drop-in replacing a system with no test coverage, official semantics, or even necessarily agreement by all consumers what it was and how it works, let alone how it should work.)
[+] [-] robrenaud|10 years ago|reply
For example, I was rewriting/consolidating a corner of the local search logic for Google that was spread throughout multiple servers in the stack. Some of the implementation decisions were clearly made because of the convenience of doing so in a particular server. But when consolidating the code into a single server, the data structures and partial results available were not the same, so re-producing the exact same logic and behavior would have been hard. Realizing which parts of the initial implementation were there because it was convenient, and which were there for product concerns let me implement something much simpler that still satisfied the product demands, even if the output was not bitwise identical.
[+] [-] colanderman|10 years ago|reply
As we speak, I'm "replacing" old code by just writing a wrapper around it with the new API it should have.
Then I'll rewrite it without the wrapper, bug-for-bug.
And then I'll actually fix the bugs.
[+] [-] AnimalMuppet|10 years ago|reply
There's always more corner cases handled by the existing code than you think there are.
There's always more bug fixes in the existing code than you think there are.
Combine all of those, and writing a replacement is always much harder than you expect it to be.
[+] [-] humanrebar|10 years ago|reply
That's probably true, but it's also true that over a long enough timescale (100 years, to trigger a reductio ad absurdum) there is a very high risk that not replacing or rewriting that code will sink your technology and possibly your organization.
Just because the risk will be realized in the long run doesn't mean it's not a risk. And if the worst-case scenario is death of the entire organization, then the math could very well add up to a full rewrite. Most business managers are not prepared to think strategically about long-term technical debt. It's the duty of engineers to let them know the difference between "not now" and "never". And the difference between "urgent" and "low priority".
[+] [-] asuffield|10 years ago|reply
You have some code that, for whatever reason, you think is not very good. There is a reason why you have ended up with code that is not very good.
If your action is to sit down and write it all again, you should anticipate getting a very similar result (and this has been the outcome of every "big rewrite" effort I've ever seen: they successfully reproduced all the major problems of the old system).
The reasons why this happen are probably something to do with the way you're doing development work, and not the codebase that you're stuck with. Until you learn how to address those problems, you should not anticipate a better outcome. Once you have learned how to address those problems, you are likely to be able to correct the problem without doing a "big rewrite" (most commonly by fixing them one piece at a time).
Sometimes I see people attempt a "big rewrite" after replacing all of the people, thinking that they can do a better job. The outcome of this appears to me to invariably be that the second team who tried to build the system with no real experience end up following a very similar path to the first team that did the same thing (guided by the map that the first team left them, and again reproducing all the same problems).
From these observations I draw one key conclusion: the important thing that you get from taking smaller steps is that you amplify your ability to learn from things that have already been done, and avoid repeating the mistakes that were made the last time. The smaller the step, the easier it becomes to really understand how this went wrong last time and what to do instead. Yes, the old codebase is terrible, but it still contains vitally important knowledge: how to not do that again. Neither writing new code from scratch without touching the old, nor petting the old code for years without attempting to fix it, are effective ways to extract that knowledge. The only approach I've ever really seen work is some form of "take it apart, one piece at at time, understand it, and then change it".
[+] [-] makecheck|10 years ago|reply
[+] [-] zatkin|10 years ago|reply
[+] [-] halayli|10 years ago|reply
[+] [-] tedunangst|10 years ago|reply
[deleted]
[+] [-] cantlin|10 years ago|reply
The hardest risk to mitigate is that users just won't like your new thing. But taking bugs and performance bottlenecks out of the picture ahead of time certainly ups your chances.
[+] [-] bertr4nd|10 years ago|reply
[+] [-] geoka9|10 years ago|reply
Do they ever? Why change the part users are used to?
[+] [-] mwcampbell|10 years ago|reply
[+] [-] shazow|10 years ago|reply
One of my favourite things about git is that the underlying storage and protocol is really simple and straight-forward to implement. You could do a lot of it in shell scripts, if you wanted to.
The stateless storage is simple and consistent, but the thing that does vary is the various operating algorithms: diff, merge, garbage collection, etc.
This creates a really interesting ecosystem where you could potentially have third-party tools that have some secret sauce producing more efficient diffs or fewer conflicting merges but still base it entirely on the open git ecosystem and remain completely backwards-compatible with all the other tooling.
[+] [-] solutionyogi|10 years ago|reply
If libgit2 is fully mature, I can imagine more GUIs/tools will be built to manage/analyze the git repositories.
[+] [-] sytse|10 years ago|reply
[+] [-] rcthompson|10 years ago|reply
[+] [-] jdmichal|10 years ago|reply
Basically, create an intermediary object that represents your state change and test those. Then "commit" the change from control and discard the one from the experiment.
[+] [-] jsprogrammer|10 years ago|reply
[+] [-] smg|10 years ago|reply
[+] [-] tanoku|10 years ago|reply
The question then becomes "why would you run these experiments offline when you can run them online?". So we simply do. I personally feel it's a game changer.
[+] [-] jerf|10 years ago|reply
[+] [-] abritishguy|10 years ago|reply
[+] [-] kevan|10 years ago|reply
[+] [-] hueving|10 years ago|reply
[+] [-] clebio|10 years ago|reply
[+] [-] daveguy|10 years ago|reply
[+] [-] eric_h|10 years ago|reply
On average, I get much more satisfaction from removing code than I do from adding new code. Admittedly, on occasion I'm very satisfied with new code, but on average, it's the removing that wins my heart.
[+] [-] LaurentVB|10 years ago|reply
[+] [-] _yosefk|10 years ago|reply
It's a good thing nobody contributes to my github repos since noone had the chance to run into the issue...
[+] [-] danielsamuels|10 years ago|reply
[+] [-] lumpypua|10 years ago|reply
[+] [-] nod|10 years ago|reply
[+] [-] jeffjose|10 years ago|reply
[+] [-] netghost|10 years ago|reply
I'm curious though if there are any strategies folks use for experiments that do have side effects like updating a database or modifying files on disk.
[+] [-] Nacraile|10 years ago|reply
The next thing is to try to take advantage of idempotence (and, by extension, try to make as many of your mutating operations idempotent as possible): there's nothing completely risk free, but you can at least verify idempotence for either ordering of new and old code, and if you're factored right, you can run both paths on the same input and verify they have the same side-effects and output.
Finally, making the observation that the new code must in general be backwards-compatible with the old code, and both versions need to be able to run concurrently (because this situation will always exist during deployment): in the worst case, you can always start with a limited deployment of the new path, which limits the amount of damage done if the new code is bad.
[+] [-] bcbrown|10 years ago|reply
Or, make the output be the SQL command used to mutate the database. If the two outputs are different, then you've found something that needs investigation.
[+] [-] blt|10 years ago|reply
[+] [-] abritishguy|10 years ago|reply
[+] [-] nod|10 years ago|reply
Edit: Ah, I see - talking about the Git bugs, not the differences . I’m actually not surprised that “256 (or a multiple) merge conflicts” was never noticed (or at least root-caused and fixed) by the entire git community.
Wonderful ability to use a large userbase as a giant fuzzer.
[+] [-] slimsag|10 years ago|reply
I've always had to report issues to GitHub via email as they do not have a public issue tracker (something I've always found a bit ironic).
[+] [-] __jal|10 years ago|reply
[+] [-] dlib|10 years ago|reply
Any change Github is at anytime going to show the specific merge-conflicts for a PR that cannot be merged?
[+] [-] openfuture|10 years ago|reply
The emphasis shift on breaking vs fixing looks like a good example of how fashion trends in tech create artificial struggles that help new people understand the "boundaries" of $things.
Fashion's like a tool for teaching via discussion
Edit: I'm just commenting on what I percieve as a fashionable title not the article.
[+] [-] jcchee88|10 years ago|reply
I could see this begin ok in most cases where speed is not a concern, but I wonder what we can do if we do care about speed?
[+] [-] cmrx64|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] yarrel|10 years ago|reply
Objecting to the name "technical debt" on the basis that it is not the correct financial use of the term is like objecting to the name "work day" on the basis that it isn't measured in joules. It's a category error.