Understanding the Git Workflow

[+] pilif|14 years ago|reply

The minute I learned about "rebase -i" and "add -p" has changed how I think about commits. I learned how I could easily keep the history clean and conversely, I learned the huge value that a clean history has for maintenance.

Now, building the commits as self-contained entities that don't break the build in between not only helps me while searching bugs later on, it sometimes helps me detect code smells around unneeded dependencies.

That said, I still like to merge big features with --no-ff if they change a lot of code and evolved over a long time, as that, again, helps keeping history clean because a reader can clearly distinguish code before the big change from code after the big change.

Of course the individual commits in the branch are still clean and readable, but the explicit merge still helps if you look at the history after some time.

"you said 'a long time in development' - surely the merge target has changed in between. Why still -no-ff?" you might ask.

The reason, again, is clean history: before merging I usually rebase on top of the merge target to remove eventual bitrot and in order to keep the merge commit clean. Having merge commits with huge code changes in them which we're caused by fixing merge conflicts, again, feels bad.

But this is certainly a matter of taste.

[+] false|14 years ago|reply

You will probably enjoy 'checkout -p' and 'reset -p' as well (revert and unstage changes hunk-by-hunk)

[+] diminish|14 years ago|reply

just like you, i enjoy rebase -i, to change history; but I also hear some poeple claim the history should be kept as it is and should not be rewritten. What are your arguments for rebase?

[+] decklin|14 years ago|reply

The idea that fast-forward merges are easier to follow is subjective. I find my --no-ff history easier to read. This author doesn't.

What always using fast-forward merges really means is that you rebase each branch onto master once it's ready to be public. Therefore, instead of resolving conflicts when the branch is merged, the commits are rewritten to avoid introducing the conflict in the first place.

Sometimes, this is really simple -- I added a line in one spot, you added another line in the same spot, you merged first, so I rewrite my commit to add my line next to yours instead of merging and resolving the conflict. Sometimes, it's not -- maybe there's not even any text-level conflict, but your feature and my feature interact in subtle and unanticipated ways and something breaks. Now, there's no "good" point in my branch to refer to, because I rewrote it on top of something where (I didn't realize) it was never really going to work. The unit test I now need couldn't have existed because it involves things that, when I was developing the branch, didn't exist.

Rebasing first is trading off when you do that work. There's more to review when the branch is ready, and there's a stronger incentive to get it right the first time. I think this may work better for the "two founders deploying from master when they feel like it" scenario -- you pay for manageability with context switches. If you have a formal QA process, I think being able to distinguish between "this branch failed QA" and "the combination of these branches failed" may be more helpful -- you can parallelize work and hack on a different private branch.

Git, thankfully, does not force us to choose one model or the other :-)

[+] sandofsky|14 years ago|reply

In my experience, on large distributed projects the person integrating changes into master is rarely the same person who authored the change.

For example, when Linux branches are pulled upstream, if your code creates a conflict your branch will just be rejected and you'll be told to fix.

Rebase forces the author to solve more of these problems before submitting their change for integration.

I don't think rebase is an end-all solution for the reasons you've described. It's perfect for medium sized changes you can easily verify afterwards. My day-to-day work usually falls into this category.

In the case of larger sets of all-or-none changes, such as a site redesign, it makes perfect sense to maintain a parallel line of development. Cleanup probably isn't worth it, and the separate branch serves as documentation. You should consciously create a new public branch.

In this case, I can understand wanting a "no-ff" merge for documentation. I think you should first consider tags, but sometimes it makes sense to set a stake in the ground with a placebo commit.

The problem is that if you use "no-ff" all the time on trivial changes, then these branches lose meaning.

This post wasn't supposed to be an embargo on "no-ff." My case is that people default to "no-ff" to pave over deeper issues.

[+] sunchild|14 years ago|reply

This opened my eyes a bit. I am a walking, talking git anti-pattern today. I'm mostly on a two-man team, so I can get away with it. I'm definitely going to start thinking more about a clean history on master.

What are some other best-practice git workflows that HN readers use?

[+] gruseom|14 years ago|reply

I work this way and agree about the value of a clean, linear history. It makes working with past versions of your code a breeze. There's one thing the OP doesn't mention that I've found important.

Say you're working on a major design change in a private branch and it has 100 commits. When it's ready to be put on top of master, you'd really like not to squash all 100 commits. Unfortunately, if there are conflicts, then rebasing B1,B2,...,B100 onto master is likely to be much harder than squashing B1,...,B99 into B100 and then rebasing. Why? In the squashed case you only have to deal with conflicts between B100 and master, while in the unsquashed case you have to deal with all the conflicts that ever existed as you progressed from B1 to B100. It's frustrating to find yourself fixing conflicts in code that you know doesn't exist any more. It's also error-prone since it forces you to remember what you were doing at all those steps. In such situations, I give up and squash. That's not great either, since you now have the disadvantages of a single monolithic commit.

The solution is to be diligent about rebasing B onto master as frequently as master changes, so B never has a chance to drift too far afield. This at least gets rid of the worst pain, which is conflicts that compounded unnecessarily. It also keeps you aware of what's happening on master.

[+] js2|14 years ago|reply

Here's a trick for you: make sure you have rerere enabled. Merge the end commit, resolve all the conflicts and commit the merge (or just run rerere to record the conflict resolution). Then abort the merge or reset back to undo it. Now do the rebase, which will re-use the resolutions for any identical conflicts. You still have to deal with conflicts unique to the intermediate state, but in my experience rerere helps a lot.

[+] pflanze|14 years ago|reply

I've always been an extensive user of rebase -i. Committing partial work often using git commit -a is easier, or at least takes less concentration, than always being careful to commit selectively with git add -p, git commit $files, but it needs squashing of those partial commits later on. I found that git rebase -i wouldn't scale to several days worth of work: I would frequently make errors when dealing with conflicts, and restarting rebase -i from scratch would mean redoing much of the work.

Because of this, I wrote a tool[1] that lets me do the same thing as git rebase -i, but allows me to edit the history changes incrementally, by keeping the history edit file and conflict resolutions around between runs; it does this by creating git patch files from all commits in question. I now always use this whenever I need to do more than one or two changes on some history; also, I'm now often creating commits to just store a note about a thought/idea/issue (the tool automatically adds a tag to the original history head, so I can look at those later on).

I originally wrote this just for me, which is the reason its own history isn't particularly clean and that I'm relying on a set of never-released libraries of mine; also maybe there are other, perhaps more well-known or polished tools than this, I don't know. I guess I should announce this on the Git mailing list to get feedback by the core devs.

[1] https://github.com/pflanze/cj-git-patchtool

/plug

[+] simonw|14 years ago|reply

This is the first argument for using rebase that I've found truly convincing - really worth reading. This will probably change the way I use git.

[+] eropple|14 years ago|reply

It wouldn't mine, if I used git (I avoid git specifically for this reason, actually, and use Mercurial). If you're actually looking at your commit logs, I find that rolling back is trivial; I can't remember the last time I accidentally rolled back into an incremental commit.

Personally it feels more like an apology for git's bad behavior than a good method of development.

[+] alunny|14 years ago|reply

For very short, "oh there's a syntax error I missed" commits, "commit --amend" is very useful, and quicker than "rebase -i".

[+] daemin|14 years ago|reply

"git commit --amend" is very useful if you realise you forgot to include some files in the last commit.

Although if you committed since then you might be better off adding a new commit with the missing files and then doing a "git rebase -i" to move and squash the commits as appropriate.

[+] mark_story|14 years ago|reply

For the extra lazy, you can use git aliases to make `git amend` or `git ca`.

[+] zwieback|14 years ago|reply

Nice post, thanks.

I've been using traditional RCSs for years but find that whenever I introduce SVN (or CVS before that) to a team it's very easy for new users to fall into bad habits around branching and committing transitory changes.

I'd like to try git to help manage the mess during the prototyping phase but I'm wondering how suitable it is for new users to learn git vs. learning svn.

Any opions out there on the suitability of git as a first version control system? My team consists of highly experienced engineers (EE/FW) with little or no software engineering experience.

[+] mooneater|14 years ago|reply

they sound like smart people. why hobble them with svn in 2011?

i put off the transition as long as i could out of inertia (switched from svn in 08 out of desperation when i started needing a lot of branch and merging). but once you go git, you dont look back, not one bit.

[+] access_denied|14 years ago|reply

[deleted]

[+] andrew311|14 years ago|reply

I'm wondering how people address one of the scenarios raised in the post, specifically this:

"It’s safest to keep private branches local. If you do need to push one, maybe to synchronize your work and home computers, tell your teammates that the branch you pushed is private so they don’t base work off of it.

You should never merge a private branch directly into a public branch with a vanilla merge. First, clean up your branch with tools like reset, rebase, squash merges, and commit amending."

I'm wonder how people address cleaning a private branch that has been pushed (when your goal is to get its changes into master cleanly). Rebasing the private branch is pretty much out of the picture since it has been pushed (unless you don't care about pushing it again). I can see some ways of doing this:

1) You could do a diff patch and apply it master, then commit.

2) You could checkout your private feature branch, do a git reset to master in such a way that your index is still from the private, then commit it. Ex:

currently on private branch git reset --soft master

Now all the changes from the private branch are changes to be committed on master. This is easy, but it puts everything in one commit.

If you wanted to do a few commits for different, but stable points, but you already pushed the private branch and can't rebase it, you could instead do "git reset --soft" on successive points in the private branch commit chain, committing to master as you go.

If you wanted to reorder commits from the private branch, I guess you could rebase the private branch (which means you can't push again since you pushed it already), then do the tactic from the last paragraph, then ditch the private branch cause it's no longer pushable.

Does anyone have better ways of putting changes to master for private branches that have already been pushed?

[+] gruseom|14 years ago|reply

Whether a branch is private and therefore can be rebased has nothing to do with whether there's a copy of it on the server. I push my work-in-progress to the server often for backup purposes anyway. If I want to rebase, I just push -f.

I can't think of why that would be a problem, but if someone objected to push -f on a private branch, I'd just make a new branch with a new name and push that. And if that were a problem, I'd just find another server to push -f to and only ever commit to master on the official server. But these are silly workarounds. Why make things harder than they need to be?

[+] motherwell|14 years ago|reply

https://github.com/nvie/gitflow works really well. The original post http://nvie.com/posts/a-successful-git-branching-model/ was really compelling, and using it has really helped, at least what I do.

[+] stretchwithme|14 years ago|reply

What really helped me grasp git was attending one of Scott Chacon's speeches on the topic. Scott works for github, knows what he's talking about and explains things thoroughly.

  http://www.youtube.com/watch?v=QF_OlomyKQQ

[+] joelhaasnoot|14 years ago|reply

Hmm, this makes sense to me: lots of Git features I'd forgotten or not used before.

Can anyone sketch my "merging" strategy I should be using in my scenario: - Have 3 branches dev, stage and master - Bugs are fixed on master, bigger bugs/changes on stage and new features on dev - Big functionality changes/additions come in the form of new branches, which currently I first merge with dev, then with stage and if everything is OK, with master. This doesn't always work well due to the timing of things: sometimes my dev branch is out of date with the master and needs fixes from the master before applying.

How should I handle merging the branches?

[+] cvandyck76|14 years ago|reply

I wouldn't have two separate branches for bugfixes and then one for new features - as you noted, it can get hairy. Personally I find the git-flow model very straightforward.

Do normal feature development and bug fixes on the develop branch; save master for production releases. When it's time to make a release, cut a release branch (e.g. r/1.0.1) from the develop branch. Bug fixes that are made on that branch should also be merged into develop. Once the release is made, merge r/1.0.1 back into master and develop and continue on as normal.

Also see: https://github.com/nvie/gitflow

[+] mark_story|14 years ago|reply

I would create feature branches for anything bigger than a few commits. Once the branch is done, you can merge it into dev, then merge that same branch into stage/master if you want.

Typically I would try to merge feature->dev->stage->master. With issues found on stage those could be put directly onto stage and merged into master. I guess it depends where you base new branches off of and where the 'stable code' is.

I usually aim to merge less stable (softer) branches into more stable (firmer) branches. And base all new feature branches off of the most firm branch I have.

[+] Maro|14 years ago|reply

Great post. Calls attention to the importance of having clean, stable commits in the 'master' branch and thus avoiding plain vanilla 'git merge' for 'squash' and 'rebase'.

http://stackoverflow.com/questions/2427238/in-git-what-is-th...

[+] swah|14 years ago|reply

He should start the article with the last paragraph.

[+] sandofsky|14 years ago|reply

People could then read the summary and skim through the rest. The summary is there just to help you remember.

If people don't internalize the reasoning, it's a disaster waiting to happen.

[+] echostar|14 years ago|reply

Under "Declaring Branch Bankruptcy", why does the author throw in a "git reset" as the last step in the example.

[+] endlessvoid94|14 years ago|reply

After reading this, I finally motivated myself to read through the man pages for git pull, fetch, merge, and rebase.

Thanks :-)

[+] trusko|14 years ago|reply

Good article. Thanks

[+] jebblue|14 years ago|reply

Git is plain scary. We should stick with SVN.

[+] pyre|14 years ago|reply

Personally I find the "git isn't enough like SVN" argument to be basically the "I stick with Windows because Linux is scary" or "MacOS X sucks because I tried it once and none of my Windows shortcut keys worked."

It basically comes down to:

1) You probably shouldn't just dive into git if you don't have someone that does know it to help you out (or unless you're willing to go seek out help from mailing lists, irc, etc).

2) You shouldn't assume that just because RCS/CVS/SVN are the only VCS's that you've ever used that it means that's what all VCS's should look/work like.

[+] j-kidd|14 years ago|reply

Try http://hginit.com for a fantastic introduction to Mercurial for people familiar with SVN (or not).

Git was designed to suit kernel development (as shown in the article). For us simple-minded mortals who like SVN, it is much easier to migrate to Mercurial.

76 comments