Git rebase, what can go wrong

I like how Atlassian puts it:

> The golden rule of rebasing

> Once you understand what rebasing is, the most important thing to learn is when not to do it. The golden rule of git rebase is to never use it on public branches.

https://www.atlassian.com/git/tutorials/merging-vs-rebasing#...

For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

The way I phrase and teach what I consider to be the important rule of git is:

> Don't rewrite history on shared branches with proper communication.

I don't teach "never", I don't teach that `main` is special, I don't teach that force pushing is forbidden, because I don't believe in those things.

I highly prefer a rebase-heavy workflow. In addition to not "cluttering" the history, it's an invaluable tool to keep commits focused on "the right level" of atomic changes.

Squash merges cut down the noise considerably.

I don't mind merge commits, it's the 100 tiny individual commits some developers seem to like to do that really clutters things up. Yes, I know, git squash is a thing, but not committing until the feature is working and ready to commit is also a thing.

> For me, even though rebasing comes with some trappings, I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

The purpose of history is to remember. Rewriting history, whether git or in life, is bad; outside of the context of don't use it on public repos. Such advice is similar to saying, only point the shotgun away from you when firing. If you have to remember such a rule, it's best to avoid it.

> I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

I've heard this many times before, but haven't been able to figure out why this is a problem. In your workflow is it a problem to have a cluttered commit history? If so, could you explain how?

> I still greatly prefer it to the alternative, which is to have merge commits cluttering up the commit history.

GitHub recently added a feature that prompts people to update their branches via merge. It's frustrating because every PR now had dozens of merge commits polluting the history.

I find it fascinating that people talk about "Having a history of what people did" in such emotive terms - "Cluttering", "Polluting".

What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.

Let it go. Accept that coding is not a smooth, robotic, endeavour, where everything is always tidy. And that's just fine.

I've accepted this a decade ago. I put my ego on the side, and now I don't care if my git history doesn't look like "beautiful" when looking at the commit graph.

I've been working on dozens of projects since, and probably did thousands of commits. Some of the teams of those projects included dozens of developers working concurrently on the same codebases. We always merged the upstream branches into our development branches and never did any rebases.

I have NEVER ended up in a situation where I thought rebases would have been better. The git tools and IDE integrations of our current age allow me to find any information I need from the history without pain.

The point of a clean git history is not to have a clean git history. The point is to make it possible to debug later, via bisect, or show, or even just a diff. The point is to make the workspace clean for the next guy.

Instead of letting it go, maybe we should have more discipline and organization in our lives and not less.

> What matters is that you end up with working systems. That a lot of change happened is just, well, what happened. It doesn't need to be prettied up and made to look like your development occurred in a clockwork march of cleanliness. It literally does not matter unless you spend a lot of time doing git-bisect.

And git blame. And git checkout to a past state. It "doesn't matter" only if ease of understanding your project history doesn't matter.

I think if the definition of a “good history” is “clean and not messy”, then yes I agree that’s pointless. If the definition is “a clear ability to see what changes were made, by who, and most importantly why” I think that’s incredibly necessary and would even go so far as to say it’s naive at best to not support.

The amount of time that has been saved in my life by someone leaving an explanation in their commit (for some weird edge case or context I’d have no way of gleaning because they’ve since left the company) is SO much more than the extra time I’ve put in to make sure the history has this extra info in it.

What's worse, the desire for cleanliness ends up making things like `git bisect` less useful.

If I had a bad day and introduced something stupid, I want a bisect to point me a the code I wrote on that bad day. If you squash liberally, perhaps because you want each commit to correspond with a release-note, you're going to lose that debugging granulariry.

The git history of a project is the main source of knowledge on that project, once the people that wrote it are gone. The git history answers questions such as "wtf is that supposed to do?", "what's this code connected to?", and "why did they do it that way?". You can use other kinds of documentation, but the git history is always there, so it makes sense to make it semi-useful.

This is such a strange thing to say. I'd be curious if you feel the same way about cleaning up your code, or cleaning up your room. I think you have an unfair advantage in this argument because it's difficult to defend such intangible benefits. We have to resort to making up logical explanations, or sounding unhinged or emotional as you suggest.

But it's simply intangible. My instinct tells me that it's helpful and that's okay. I don't owe anyone a justification for how I organize things, and there's nothing controversial about this. (Or maybe I could even come up with a logical example of a benefit, but that's a trap I'm not going to fall into) And a lot of people agree, and they know what I mean, so it's not merely an individual preference. If I have to work with someone who has strong preference against it I'll worry at that point about negotiating.

A clean git history on a pull request also makes it easier for the reviewer to understand your code. Small, concise commits will tell the reviewers about your train of thought or what issues did you run into, making it easier to pick up the context. I start with every code review by looking at the commit history.

I prefer not to have squash commits in our team for this reason. It makes master look good, but usually nobody ever looks at the master commit history first, they look at the merged pull requests. However, everybody must look at the commits you made in a pull request. If you have squash commits, you are encouraged to have messy commit history in your pull requests, leading to meaningless commit messages and even large commits (causing other problems...).

IMO the only advantage of squashing is that it makes it easy to roll forward when you accidentally deploy something that causes problems.

Agree, plus let's avoid having the CI pipeline creating commits in the remote repo. I like CI/CD to be stateless with regards to the files in the repository. I tried to plea for this today with my colleagues with very mixed results

It’s ego

I’ve never understood the tradeoff of rebasing, squashing or otherwise “keeping a clean history”. It always seemed like tons of sometimes highly error prone work (sometimes you can wipe out a colleague’s work with it! Wtf!), for almost no gain (why does it matter that the git history is “clean”?).

It matters because when I:

* use filtering commands like "git log -S"

* press the "annotate" button in my IDE and can see which commit introduced each line

* run "git bisect"

* use "tig" to drill down through the history of a file (shortcut "," is "move to commit preceding current line's blame commit")

...every step of the way, I get a meaningful description of why a change was made and what other diffs were necessary to achieve that change. And not just "fix", "bug", "PR commments".

> why does it matter that the git history is “clean”?

Makes reviewing a set of changes prior to a merge much easier. It's nice if there's a 1:1 correlation between a commit message and the actual patch contents.

Im sure you've dealt with the case of reviewing a colleague's changes with a commit message like "Enable logging in foobar module" and the patch is actually enabling foobar logging and a bunch of other stuff.

This makes bisecting your git history to identify and fix bugs much more difficult.

If the git history is clean, you can just read the commit messages and implicitly trust the developer if clean git hygiene is in place (as opposed to actually needing to read the whole diff on a per-commit basis to find out what _actually_ happen at commit XYZ, despite it's message).

For me the big gain is at the code review stage. It's much easier to review a set of patches that are a clear and distinct sequence of changes without "oops, fix bug" changes later in the series. It does require extra work by the code author, but it means less work for the code reviewer. Depending on the project and the organisation and the workflow, that can be a worthwhile tradeoff.

Never understood why you wouldn't want it clean. There's no benefit whatsoever to it being messy and it's a liability for a lot of reasons, whereas the clean version is free and easy and makes everything you do that interacts with git history simpler.

Another thing is if you keep commits in a clean "state", it is easier to revert a commit, when you squash or keep them messy it can make it harder to revert.

Also sometimes you decide you want to backport some change to other releases, and if commits are in a good state, it is much easier to do this.

Sometimes when working on an old code base built by developers that came and went, one needs to perform what I call "code archeology": going back in time to understand why a feature was implemented the way it was.

Whether this is feasible at all depends largely on the care developers put in structuring their commits.

When an engineer made a change is of no consequence to me. When it got merged into the main branch does matter a whole lot if you're doing trunk-based development.

> sometimes you can wipe out a colleague’s work with it! Wtf!

I’m not necessarily on Team Rebase, but isn't this just as likely with merging gone wrong?

It makes your git graph instagrammable.

It's for humans. You can more easily cycle to a specific point. I find linear history easier to comprehend. But it's not like a game ender. People will do whatever they will.

I find it easier to run git binary search with it like this too.

> keeping a clean history

This being a principal reason for VCS, I very much understand the motivation.

I love rebase (I'm a tip-of-master-only person, no merges ever, squash all your commits with `rebase -i` before pushing and write one good commit message for the group). But there's one really, really irritating thing about them:

You should not be able to use `--amend` during a rebase.

For me editing all my changes onto the commit I'm working on with `git commit -a --amend` (or as I've aliased it, `gcaa`) is automatic; I do it 500 times a day, just to save my work. But I can't count how many times I've been in the middle of squashing commits and accidentally typed `gcaa` and amended someone else's commit after fixing a merge conflict, and it's super annoying to unwind (if you realize after typing `rebase --continue`) so usually I end up just giving up and starting over. I really wish amending to a commit that wasn't one of the ones you're rebasing was just totally disabled.

I guess there are some other small complaints, like the annoying reversing of `--ours` and `--theirs` from what makes sense (yes, it makes sense if you have the internal model of rebase instead of the intuitive one, but that's stupid), rebase's tendency to pick the wrong parent commit if you've accidentally amended someone else's commit (and therefore lag a while and then produce a rebase log of 1000 commits or something), and the utter tedium of editing the rebase log to replace every instance of "pick" with "s" for squash except the first, since almost 100% of the time what I want to do is squash everything (and use the last commit message, not the first, and definitely not all of them munged together which is the default).

I would love a separate command or a flag, like "git rebase --tip" that does all of this automatically for my otherwise extremely elegant workflow (and I'm gonna be really bummed if it turns out it exists and I didn't know about it for the last 5 years...).

Random thought: given you already have the gcaa alias, perhaps you could include a check that .git/REBASE_HEAD doesn't exist in that?

Probably easiest as a little shell function like

    gcca() {
      local GIT_DIR
      if ! GIT_DIR=$(git rev-parse --git-dir); then
        return 1
      elif test -f "$GIT_DIR/REBASE_HEAD"; then
        printf 'Rebase in progress: commit --amend is disabled\n' >&2
        return 1
      fi
      git commit -a --amend "$@"
    }

rather than an alias?

[Edit] I forgot about rev-parse --verify, which simplifies this further:

    gcca() {
      if git rev-parse --verify REBASE_HEAD >/dev/null 2>&1; then
        printf 'Rebase in progress: commit --amend is disabled\n' >&2
        return 1
      fi
      git commit -a --amend "$@"
    }

This also leaves you still able to use commit --amend long-hand if (for example) you want to edit one of your own commits during rebase -i.

I’m curious how this workflow differs from `git merge --squash`.

> accidentally typed gcaa and amended someone else's commit after fixing a merge conflict

You could try reverting the first commit on the HEAD once you finish the rebase. This is of course assuming your branch and the last commit don't touch the same files.

Since everyone is bringing up squashing...

There's a false dichotomy nobody addresses here, which is the notion that there needs to be such a thing as "the" history for you to get the benefits of a clean history.

If all you really want is a linear history, then just do merges, and make sure the "first parent" is the main branch (which you can enforce with tooling). Now you can just traverse solely the (linear!) sequence of first parents, which is exactly the same view squashing would have given you, except without the information loss.

If for some reason you can't stand the idea of something branching off your main branch at all, then set up a separate job that automatically squashes everything onto a branch that only it can write to (or branch from). Now you have a truly linear history with nothing branching off it, exactly as you would've had with squashing. And you can always reproduce it on demand.

That way you avoid the information loss, and can always do archeology on the full evolution graph if needed.

If I were to write a blog post on this I’d make a few do’s and don’ts (why make a blog post when you can blog in HN comments?)

Don’t merge the base branch into a feature branch. Rebase to “update”.

Do use rerere and the curse of fixing the same conflict over and over is (almost) gone.

Don’t rebase (or force push for other reasons) a shared branch. Rule of thumb here is you can probably rewrite history if you work with _one_ coworker in a branch but any more than that and you’re more likely than not to upset someone.

Do rebase -I HEAD~N to reorder/reword/squash into easily reviewable sequences of commits.

Don’t force push after review, until the review is complete. This keeps the history of the review process but you can later merge the fixups with the commits they logically belong in right before merging.

Do use Merge, Squash and “Rebase+FF” as appropriate for merging PR. There is no best solution for every scenario so prescribing “always merge” or “never merge” or similar isn’t helpful. A good rule of thumb though is that IF a branch has merged from the parent branch to update (which I suggested was a “don’t”) then avoid merging it back. A branch that was updated that way is better to e.g squash when merging back.

> fixing the same conflict repeatedly is annoying

This is usually caused by merging an upstream branch (e.g. develop) into your feature branch and then later trying rebase it.

Effectively the commits you've merged in from develop undo the changes you've made in your feature branch. You fix them but the foreign commits undo the changes again.

The solution is actually pretty easy. Use git rebase --interactive to remove any commits from the rebase that aren't directly part of the feature work.

You may still have an odd merge conflict to fix but you'll only have to do it the once and everything should go smoothly.

I would also recommend never using the same commit message twice. When you have a list of 10 commits all called "Wip" it's hard to tell which are obviously duplicates that can be deleted.

Git rebase is stupid, I’ve seen countless f ups because someone needed the git history to look good

Mercurial fixes pretty much all of these problems via changeset evolution: commits are marked as obsolete and the obsolescence marker says which commit replaces the obsolete commit. So you have a meta-graph of commits as they change. You can therefore undo, you can trace the history, and since obsolete commits aren't shared by default, they slowly fade away.

https://www.mercurial-scm.org/doc/evolution/

It's a good idea that's been attempted to be ported into a git

https://lwn.net/Articles/914041/

There are just too many things you have to know about which commits are what and what's going on - especially with a larger team - it's not a good use of time imho to be fixing rebases when they go wrong. Like the big list of dos and don'ts at the end of this should be a red flag.

Better alternative I've found is squash merge - topic/feature branches are squashed as a single commit instead of bringing down each individual commit or creating a merge commit. You're history is cleaner, you're able to revert stuff easily, and it's really hard to mess up since it's just an atomic last step you do in your workflow.

The trick to using `rerere` with `rebase` is to merge first, resolve the conflict, record the resolution, then go back and do the rebase. It's explained here:

https://www.git-scm.com/book/en/v2/Git-Tools-Rerere

It's often easier to resolve a conflict during a merge than during a rebase because it presents you with left, right, and the common ancestor. You're also only looking at the tips of each branch. With rebasing, you're replaying each commit one on top of the next so you lose the common ancestor information and you may also have conflicts that won't exist at the end.

Another tip: if the other branch has changed a lot since you last rebased, even a single merge may have a lot more conflicts than you want to deal with all at once. In this case, consider a series of intermediate merges since you're going to throw them all away anyway.

* Regarding "splitting commits in an interactive rebase is hard" - I actually use `git reset` (to unmake a commit) followed by several instances of `git add -i` (to add individual changes into a commit) + `git commit` (to actually make the commits). If the commits to be split are in a middle of something, it's possible to do all of this inside a `git rebase -i`...

...which is exactly what is suggested in the section linked in the article, https://github.com/kimgr/git-rewrite-guide#split-a-commit.

* Regarding "weird interactions with merge commits" - `git rebase --rebase-merges` tends to help most of the time, since, during a rebase, merge commits are skipped by default (even if they contain changes).

> force pushing makes code reviews harder

On any code base I've worked on that's larger than a small FOSS project, I've found that this simply isn't avoidable. Yes, there's merge commits but, for reasons I won't go into, I think those are worse than the alternative of rebasing and making code reviews difficult.

> One way to avoid this is to push new commits addressing the review comments, and then after the PR is approved do a rebase to reorganize everything.

Not realistic when working on a code base where PRs are being squash-merged every hour and the code review lasts for days.

The best middle-ground is to avoid rebasing until the current wave of feedback has been resolved, even if no one has actually approved yet.

Uising squash merges has reduced my need to rebase a lot. I don't really care if I have merge commits on a feature branch for a PR if there's a reasonable history on the main branch when I'm troubleshooting an issue with git blame.

Why squash merges? I have a number of team-mates who make local commits on feature branches that make the history look like a series of less-than-useful commit messages. E.g. wip, wip, wip, wip, make it work, wip, wip. All of the context for the change is actually on the PR, so it's really only helpful to see the PR message and have a link to the PR for the discussion on the change.

git rerere only "automates" conflict solving after you already solved it. As in, it remembers previous merge resolutions, even if you undo the merge/rebase.

It is particularly useful when doing difficult merges regularly. Invariably I'll find a mistake in the merge and start over (before pushing, obviously); the second "git merge" remembers the previous resolutions so I don't have to solve all the same conflicts again.

Similar for difficult rebases that may need multiple attempts.

Git remembers resolutions across branches and commits, so in the rare case where (say) a conflict was solved during a cherry-pick, rerere will automatically apply the same resolution for a merge with the same conflict.

I think the reason it's not on by default is that the UI is confusing: when rerere solves for you, git still says there is a conflict in the file and you have to "git add" them manually. There is no way of seeing the resolutions, or even the original conflicts, and no hint that rerere fixed it for you.

You just get a bunch of files with purported conflicts, yet no ==== markers. Have fun with that one if you forget that rerere was enabled.

"Don't make me think" has been the best principal of coding for me for a long time. Looking at how much thinking overhead rebase is producing, I'd prefer to avoid it.

I'm jealous of people who enjoy rebasing. Such a simple life they must lead. When I'm tasked with rebasing a feature branch with 1,000 commits, written by 10 different people, onto a new release branch with another 1,000 new unrelated commits, written by 10 different people, I really start to question my life choices.

I hope Julia keeps writing about git, because I'm sure it will teach me something!

I'm still searching for a way to manage long-lived Postgres submissions, the most challenging git scenario I've encountered. Julia's post finally got me to brain-dump my current process, something I've meant to write down for a while now:

https://illuminatedcomputing.com/posts/2023/11/git-for-postg...

This link could almost be an "Ask HN": if any of you have suggestions to improve my workflow, I'm all ears. (I asked around a bit last May at PGCon, but didn't get any concrete advice there. Maybe it's too complicated for a hallway off-the-cuff discussion.)

I prefer rebase over merge because when merging since the commits are added to HEAD, when I add new commits after merge, I find it harder to understand what is being added with the current pull request.

One thing that annoys people when rebasing is, if you need to rebase a couple of times and also you change the commit history of current branch, you might end up solving the same "conflicts". To avoid this, you can use git rerere, this basically saves your conflict resolution, and if the same conflict is encountered, it resolves it automatically: https://mirrors.edge.kernel.org/pub/software/scm/git/docs/gi...

I've largely come to a workflow of creating a feature branch and periodically merging main out to that branch. When it's done, I use the github squash and merge feature to bring changes back in. Cutting non-ancestor rebases out of my workflow has been great for my personal sanity.

394 comments