top | item 11734911

Your project's RCS history affects ease of contribution (or: don't squash PRs)

43 points| JoshTriplett | 10 years ago |mjg59.dreamwidth.org | reply

28 comments

[+] ereyes01|9 years ago|reply

Your commit history should be readable like your code and your tests. It's part of the final product you deliver to the rest of your team. It's easy to forget or ignore your commit history, which is probably what leads people to encouraging squashing all commits in a PR. Ugly commit history is a code smell.

I agree with the OP that it's a bad blanket policy. We are fortunate that git has powerful abilities to edit commit history (git rebase -i is my favorite). Rather than just requiring everything to be squashed as a matter of policy, maintainers should instead request that commit history be cleaned up if it's not presentable, IMO.

[+] newjersey|9 years ago|reply

> Your commit history should be readable like your code and your tests. It's part of the final product you deliver to the rest of your team. It's easy to forget or ignore your commit history, which is probably what leads people to encouraging squashing all commits in a PR. Ugly commit history is a code smell.

> I agree with the OP that it's a bad blanket policy. We are fortunate that git has powerful abilities to edit commit history (git rebase -i is my favorite). Rather than just requiring everything to be squashed as a matter of policy, maintainers should instead request that commit history be cleaned up if it's not presentable, IMO.

For what it is worth, I think history is sacred. Never rewrite history (except in extraordinary circumstances and that too very rarely). Of course, do whatever you want on your local machine before pushing to remote but once you have pushed, stop worrying about it.

[+] ikawe|9 years ago|reply

On the other hand, a couple years down the road, a generic pull-request-level comment (added feature foo) can be more useful than a bunch of noisy commits like

    - WIP: feature foo
    - fixed typo
    - whoops! 
    - working
    - reformatted

As is often the case with new contributors to open source projects.

In theory --no-ff merges give you the best of both worlds, but in practice, I traverse git history with 'git blame' or 'git log -p <filename>' neither of which do a good job of surfacing "this commit was merged as part of feature/foo".

Of course, all the info is there... But I haven't yet found a good UI for it.

[+] geofft|9 years ago|reply

I have the following aliases in ~/.gitconfig:

    [alias]
        log0 = log --first-parent -m
        show0 = show --first-parent -m
        blame0 = blame --first-parent -m

These aliases treat history as if each merge commit introduced the changes directly, and didn't have the individual history. (Even for one-patch PRs, this means that the log/blame view will point me to a commit with a link to the pull request, which means I can see any discussion of the PR.) For instance, `git show` on a merge commit will typically show no output (unless there was a merge conflict), but `git show0` will show all changes introduced by the branch being merged in, as if they had all been squashed down.

My experience is that I've really appreciated projects with a policy of mandatory merge commits (e.g. Rust; every commit is tested and merged by the CI bot, and humans only push to master in emergencies) with these in place.

[+] fennecfoxen|9 years ago|reply

In theory, this is why we rebase our history and make an elegant series of coherent patches. At least if "we" are the Linux dev team or something.

[+] nfm|9 years ago|reply

I think everyone would agree that the above commit history is bad, and should be avoided.

But solving that issue by adopting a squash-on-merge policy is like doing surgery with a chainsaw. There are better tools for the job.

[+] Upas|9 years ago|reply

There is definite value in having more granular commits, but at the same time, there are commits that don't really belong in the commit history.

While it might take more time than squashing everything, git rebase -i really helps a ton in these situations. I can fixup all the "fixed typo" type of commits into a meaningful commit, and even reword commits if necessary.

[+] nas|9 years ago|reply

I've had exactly this experience multiple times. Having nice logical commits makes it a heck of a lot easier to port features around to different branches later on (if you decide you want to). It makes 'git annotate' much more useful. It makes 'git bisect' more useful. It makes backing out a feature easier.

I don't want to see some massive 3 month worth of feature hacking, 5000 line modification get squashed down into a single commit. I want it broken into the smallest possible working and incremental changes that transform the old code into the new. Each step should make sense for the code maintainers who come along later and try to figure out what you did and why something unexpected broke. ;-)

I don't want to see 200 noisy commits like you say. I've worked with code bases that used both of these styles. The former is vastly superior in my experience.

[+] MaulingMonkey|9 years ago|reply

The problem I find is that squashing those commits together will mean I can't easily figure out what typo was fixed in commit 2, because of the diff-spam reformatting of commit 5. This makes it that much harder to figure out that the typo was a variable name in a dynamic language where all use cases on-branch were fixed, but other use cases (e.g. in my own personal private branch) were not.

I care much more about the actual mechanical code change which broke my stuff, than the feature or branch that happened to contain that mechanical code change.

The same problems apply to reviewing the code in the first place. I'll gladly review 10x the code if it's split into nice tiny micro-commits, even messy ones, than try and suss out exactly all the moving parts that changed from a squashed diff to make sure all potential concerns are addressed. Commit #2 might trigger a "email your coworkers to alert them of the breaking change when submitting" review note, whereas I'm liable to miss the fact that you fixed a typo in the first place in a squashed commit.

[+] marssaxman|9 years ago|reply

Some people clearly pay a great deal more attention to commit history than I ever do. The only difference this makes to me is that adding more hoops I have to jump through before you'll accept my patch makes it less likely that I will bother to submit one in the first place.

[+] jonaf|9 years ago|reply

Regardless of your opinion, if you have any interest in code quality (hint: you should), then you should respect the contributing guidelines for the project you are contributing to.

Your argument / opinion as stated could have "commit history" replaced with "unit tests" and in many circles (for some reason), people are still making that argument. For a project to be healthy, a critical eye must be used for every change: from the code, to the tests, to the commit message. If you don't care about any one of these, you are saying you don't care about the quality of the software. I would deny your PR on principle. My project needs more quality than it needs more contributors.

Note that I did not suggest squashing is good or bad. My opinion on that is not related to the fact that you should care about it and treat it as just as important as your contribution, and you should respect the project enough to abide their guidelines for contribution (unless your contribution is revising those guidelines).

[+] richmarr|9 years ago|reply

I've let at least one PR go stale after having made the maintainer's requested changes, only to find a secondary list of demands such as squashing and 'comments too long'. Way to suck all the enthusiasm out of contributing.

[+] geofft|9 years ago|reply

I do feel like this sort of thing should be the maintainer's responsibility. I'm pretty familiar with `rebase -i`, but only because I've done a bunch of work with git. Random contributors are as likely as not to be brand new to git. It should be reasonable to tell the maintainer, "Here's my branch, feel free to rebase or squash as you see fit."

Of course, I think the same is true with things like comments, obvious coding style fixes, tests (especially if the maintainer has more access to test infrastructure), etc. I can see the advantage of getting people to do it themselves if the maintainer is busy or especially if the contributor is planning on sticking around for a while, but surely the maintainer can ask "Hey, would you <...>? If not, I can do it."

[+] Manishearth|9 years ago|reply

You should squash PRs. What you shouldn't do is indiscriminately squash PRs. Having a bunch of "oops, fix" and review fix commits lying around makes it harder for everyone too. It also affects bisection much more. "don't squash PRs" is just bad advice; it's more nuanced than that.

Split your work into bite sized pieces, and amend/squash the relevant fixups into these before merging.

Since these tasks can need knowledge of rebase, I offer to do them for new contributors too, which usually fixes that issue.

[+] ashitlerferad|9 years ago|reply

Use `git rebase -i` but don't squash PRs.

[+] geuis|9 years ago|reply

I cannot disagree with this more. Commits should be feature-based, with thorough commit notes. A feature can be a one line change all the way up to something affecting hundreds of lines (or more).

High quality code standards (good variable naming, plentiful comments, etc) tell you what's happening at the code level. The commit history tells you what's happening at the feature level.

Don't pollute history with noise.