top | item 29010307

Refactoring vs. defactoring

53 points| todsacerdoti | 4 years ago |understandlegacycode.com | reply

43 comments

order
[+] lucasmullens|4 years ago|reply
> Prefix your commit message with R for refactoring and C for the other kind of changes

No one would know what R and C mean, and I'd have to teach that to the whole team. Just use the word "Refactor" if you're refactoring as the first word of the commit, and don't use that word for everything else. You can't just start using your own made up abbreviations without getting buy-in from other people who work on the codebase.

[+] omegalulw|4 years ago|reply
> Just use the word "Refactor" if you're refactoring as the first word of the commit,

+1. I would say having refactor in commit header is fine too, though it's more natural to start the sentence with refactor.

[+] drewcoo|4 years ago|reply
Made me think "Hungarian commit messages."
[+] t-writescode|4 years ago|reply
This is a good article and I like the general idea it provides. I want to add a small disagreement, though.

> With the Refactoring hat , you are not allowed to change the way the code works.

This is the only part I have a small disagreement with. Sometimes, the act of refactoring it _does_ change how the code works, but it's a change in how the code handled ambiguous or extremely complicated states.

Sometimes refactoring results in streamlining an if-statement / configuration-detail mess and it's the right time to fix it.

It may be way, way, way extra work to put all the bugs intentionally back into the code in the refactor, and you may not even put the bugs back in correctly.

[+] nicbou|4 years ago|reply
While refactoring, I have to understand what the component does or should do. I read the original code and sometimes the commit history for certain lines. Then I write tests to ensure those assumptions carry to the new code.

More than once, I ended up finding bugs in the old code this way. The ticket said one thing (ticket IDs in commit messages are a godsend), and the code did another thing. There was also a lot of unexpected business logic that was discovered after the first draft of the code, and not documented.

I had the same experience translating text content into calculators. Text can be ambiguous, but code can't. Text had errors that coded logic wouldn't allow.

[+] Jtsummers|4 years ago|reply
In most of the examples of refactoring there's a strong distinction made between changing how something works internally and changing its behavior.

The behavior is its external behavior, not internal. As a somewhat trivial example, in Common Lisp you may use an association list like:

  '((a . b) (c . d) (e . f))
This is grossly inefficient for some operations, but functions as a map. You realize, "Oh, we have alists with, potentially, thousands of elements or have to search it over and over, it's killing our performance". So you replace that with a hash table to get that nice O(1) performance on a lot of your operations. Your users though, never see a behavioral change because they still access the data using the same methods as before:

  (get-value map 'a)
To them, there has been no change, and consequently for behavioral tests there has also been no change. It performs better (so there is a change, obviously) but it does not impact the guarantees you offered users of the unit.
[+] chriswarbo|4 years ago|reply
Similar to the defactoring-then-refactoring mentioned in the article, one approach I find useful is moving things towards point-free style ( https://en.wikipedia.org/wiki/Tacit_programming ), which can discard lots of extraneous layers and plumbing. Then variables and abstractions can be introduced, to avoid the more horrible aspects of point-free code (e.g. `(.) . (.)` sillyness)
[+] Zababa|4 years ago|reply
> Automated refactorings are fast and safe—even when you don’t have tests.

Depends on the refactor. At work some people tried to do some automated refactoring with Resharper, which changed names of fields of a class, which broke stuff that relied on them for reflection.

[+] xg15|4 years ago|reply
I think that's more an argument against using reflection though. If you use reflection, you're effectively changing the rules of the language. No automated tool will be reliable after that.
[+] splittingTimes|4 years ago|reply
Great point. We use method/class names in config files that get parsed and dynamically evaluated. You could break the whole application by changing class names as an init config could fail.

Also persisting stuff in customer DBs that depends on class names (and reflection) is nice source of terror. Shipping a new version of your software with a refactored class names corrupts the whole DB of a customer.

[+] enb|4 years ago|reply
Use nameof instead of a string literal to refer to names of types
[+] postalrat|4 years ago|reply
I think these types of articles get written after people realize how massive testing makes real refactoring harder, not easier.

To them the problem must be the way people refactor because it can't be the way things are tested.

[+] mikewarot|4 years ago|reply
I hadn't thought of separating out changes vs refactoring, it seems like a good practice to try out. Thanks!
[+] jakub_g|4 years ago|reply
It's generally a good idea for each commit (or even PR) to be isolated unit of work on its own. Makes much easier to review, and isolate regressions.

- refactoring: one commit

- fixing a bug: one commit

- fixing another unrelated bug: another commit

- new feature: another commit

In old version control systems it might have been difficult to do this, but with git it's a breeze given some practice: even if you're in a middle of something, you can always stop working, do a temporary commit or stash; then checkout main branch, create a new branch; do the thing you want to do (refactoring or bugfix) in isolation; commit; then go back to the other branch, and rebase.

There's also git add -p for making partial commits if you did a lot of changes and want to commit only some of them.

[+] darwingr|4 years ago|reply
The is actually already a specific word for code changes to behaviour that contrasts well with the structural change of refactoring: Transformations.

The word fits well in to TDD's red-green-refactor with the "Transformation Priority Premise". The premise goes as follows: as the tests get more specific, the code becomes more generic. https://en.wikipedia.org/wiki/Transformation_Priority_Premis...

[+] darepublic|4 years ago|reply
seems like a lot of work to split up your commits like this. PR means -- trust no one. the truth is out there (in the code).
[+] Chris_Newton|4 years ago|reply
I don’t try to separate every last code change that doesn’t affect behaviour, but I often find it easier to put major refactorings in their own commits. This has at least two advantages: it means you know there shouldn’t be any test changes showing up in the refactor commits, and it means there is less to understand and review in other commits that do intentionally change the system’s behaviour.

It’s not really a lot of extra work to do that once you get into the habit. Tools like `git add -p` to stage partial changes in files can be useful for separating self-contained refactorings from behaviour changes if you realise they’re happening at the same time and starting to get tangled.

[+] codyjones|4 years ago|reply
Someone has to read and like your PR to approve it. Presenting your work as a series of simpler steps helps the PR recipient (and you!) mentally verify it.
[+] cratermoon|4 years ago|reply
Not nearly as much work as untangling the changes-that-broke-something from the changes-that-made-the-code-better. If the first commit is behavior-preserving and the next commit is the one that changes the behavior, it's easy to revert the breakage but keep the improved code.

The other choice is to roll the whole thing back or practice forward-fixing. The latter is an entire discipline in itself.

[+] mikewarot|4 years ago|reply