Squash your commits

[+] WorldMaker|10 years ago|reply

Sometimes I feel like it's a minority position, but I think it strange all the efforts people go to in order to essentially make the git DAG look like a (lie of a) straight-line CVS or SVN commit list. Seeing how the sausage was actually made (no rebases, no squashes, sometimes not even fast-forwards) isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that. It's real and visceral and how software is actually made and you can learn from that or find things to explore in that jungle. Projects with multiple developers that yet have straight line commit histories and super tidy commits are aberrations and full of little lies...

Kudos to GitHub for providing this feature that a lot of people have asked for. I obviously don't plan to use it, but I appreciate that it's an option for those people that like their small, harmless lies. ;)

[+] phasmantistes|10 years ago|reply

I actually disagree. Large teams that still have linear commit histories doesn't mean it is a lie. It means that the code review process is more important that the code writing process.

For example: I check out a repository, and create a local feature branch. I create a commit containing the tests for the new feature, then one for the first draft of the new feature, then two or three for bugfixes. Each commit is small, and self-contained, but importantly isn't standalone. If someone checked out the repository in the middle of my chain of commits, they wouldn't have a working product. Then I upload my change for code review. There's no point in reviewing each of my ~5 commits individually: they only make sense to the reviewer as a combined unit. And there's no point in landing them individually: they only make sense for the overall project history as a combined unit.

In a project with many developers (e.g. 1,000 like the Chromium project), every developer has different local practices. Some keep their work based on HEAD of master via rebase, others via merges. Some do test-driven development, some don't. Making the code review the atomic unit of work, rather than the messy string of local commits, helps the project enforce common etiquette, commit formatting, and readable history.

[+] jasode|10 years ago|reply

>Seeing how the sausage was actually made, ... it is meaningful and will tell you a great deal about a project and its developers... I trust that.
...tidy commits are aberrations and full of little lies...

...small, harmless lies.

Interesting choice of words.

Here's another way to think about squashing private commits for public consumption: programmers do not install keyloggers and upload their entire keystroke history including every Backspace and Ctrl+Z used in their text editor to the repositories. And, most of us wouldn't care about seeing it.

Whether John typed "x = 218^H^H73" or "x = 273" is a meaningless distinction and irrelevant noise. Those spurious ^H Backspaces are equivalent to the twitchy multiple commits in private branches. We really don't want to see them. Think of private noisy commits as an extended workspace of a text editor. If squashing those commits is a lie, the Backspace key without an audited keystroke log is also a lie.

Side note: The other comment downthread about keeping all private commit history for "git bisect" is a red herring. Sometimes a commit will deliberately have broken syntax -- e.g. make a quick commit before getting up to grab a soda -- it won't be CI test worthy. Besides, an automated CI server's cpu cycles can point to an upstream integration/test/qa branch instead of a programmer's private branch.

[+] Stratoscope|10 years ago|reply

> It's real and visceral and how software is actually made...

It's also extraordinarily cluttered, and it really gets in the way when someone later want to do a 'git bisect' to track down when a bug was introduced.

When I'm working I do frequent little commits just to capture and back up my broken stream-of-thought experiments. None of those are going to be relevant to people who work with this code in the future; how does it benefit them to impose my haphazard process on them?

Of course if there's a way to break up my final commit into more meaningful smaller commits, I do that rather than one monolithic commit.

For example if I clean up some whitespace issues, add some new comments to old code, and implement a new feature, I'll put those in three separate commits even if I originally did the entire change at once. In this case you'll see more commits in the public history than I originally had.

Or if I check in a new version of some external library, add an API call that uses it along with its tests, and add UI code that calls the API, those may be separate commits in that order.

My goal is to make the public history useful to future developers.

[+] stormbrew|10 years ago|reply

What frustrates me is that there are mechanisms to deal with this problem that maintain the DAG and history (cleaned or not), but people jump to the solution of destroying the process behind the code rather than use them.

In particular, I would kill for support for basically the `git log --first-parent` option in viewing the history of a branch on tools like github and gitlab. Rather than squashing your branch, you make your merge commit have a meaningful commit message (which you do anyways for a squashed commit) so that you don't always have to be viewing the tangled web underneath.

There should be no practical difference between a squashed commit and a merge commit from the perspective of the branch that commit is on (they represent the exact same change from parent to child), but the tooling insists on giving you the most complicated possible view all the time so there is a tangible difference.

[+] geuis|10 years ago|reply

Couldn't disagree more. Having worked extensively on teams on both sides of this issue, I can experientially state that a well-done git rebase and commit strategy is much more useful and helpful.

In terms of feature branches:

The individual engineer is free to do individual commits in their branch as they need to in order to keep track of their work. Before they submit a pull request, they should rebase and squash all of their commits into a single one that thoroughly describes everything in the feature that is being committed. When used in conjunction with tools like Phabricator, Arcanist, and commit templates, the workflow is very smooth.

When another team member goes to code review their pull request, rather than having to examine multiple individual commits there is only a single one to examine and comment on.

Master history:

Rather than cluttering up the mainline history with 'Did this', 'Did that', 'Merged: Did this', 'Merged: Did that', 'Reverted: Merged: Did this' etc, you get a series of commits that articulately describe what each commit was for. In the event you need to revert a feature because it breaks something, its much easier to revert that single commit than trying to hunt through all of the individual commits from an engineers feature branch. And in that case, if you revert one of the commits from the feature branch it could break something else.

[+] dsp1234|10 years ago|reply

isn't pretty, but it is meaningful and will tell you a great deal about a project and its developers... I trust that.

  Here is a (made up), but generally realistic git log

  git log | grep -i WIP 
  
  mon 5pm - WIP, going to work on this from home 
  tue 4:45pm - WIP, going to work on this from home
  wed 2:30pm - WIP, meeting
  wed 5pm - WIP
  thu Noon - WIP, working from the cafe on my laptop
  fri 5pm - WIP, working from home
  sat 3pm - WIP, heading home sick for the day

Does it really matter to anyone, and count as anything but noise to know that I committed my work in to the repository just so that I could work on it from a different computer. I can't imagine how low the signal to noise ratio would be if every person on the team did this.

[+] InclinedPlane|10 years ago|reply

What annoys me is that so many people take the position that there's no other way to improve the experience except for building an hazardous and error prone system into the core workflow of a tool that should ideally deal primarily with immutable history, rather than building better tooling to manage that complexity and present history in a useful way other than a raw list of commits. I mean, who would build such tools, and isn't managing complexity hard? Oh wait, it turns out we're developers, and managing complexity is the core reason of existence of all software engineering. It's really rather silly.

But it's a lot easier to convince devs that using tools with bad UI elements is hardcore and makes them look smart instead of demanding improvements.

[+] zenlikethat|10 years ago|reply

They're not full of little lies. Projects insist on squash-before-merge to help developers keep their sanity.

Imagine working on a fast-changing block of code with 20 or so other people concurrently. If all 20 of those peoples have patches submitted to master for review, and all of those patches have multiple commits each, then every time one gets merged, the others will have to rebase to its changes and fix conflicts for _every single commit_ while the fast forward plays out. It's a horrible experience. It prevents desirable code from getting in as contributors drop out due to the browbeating.

Squashing commits can be the difference between tediously fixing something once vs. tediously fixing it 20 times. No one needs to know that you changed your mind about calling that struct "ConfigOpts" before it was ever introduced upstream.

[+] ealexhudson|10 years ago|reply

Sometimes the scaffolding should be left in place to help show others how to build things; other times, it was put up and taken down so many times that the learning from the first few attempts isn't worth it.

I would probably be much happier not squashing stuff if you were able to bisect cleanly to points between branch merges. I don't think that's even theoretically possible; which means that when you're bisecting it's quite possible to pick up half-baked mid-branch points that you have to recognise for the broken rubbish they are - lots of false positives there occasionally, and makes bisect a lot less useful. On a repo with nice squashed commits, you tend to be able to narrow down to the feature very quickly - of course those commits then tend to be bigger etc., but I find that less of an issue.

[+] kylnew|10 years ago|reply

Have you ever tried following a change in a repo that came from an unsquashed PR? It's hell.

What's truly meaningful IMO is a git log that reads like a product change log

[+] DougWebb|10 years ago|reply

Do you see the commit history as a piece of living art, a representation of the community and culture that has produced it, or do you see it as a tool?

I'm an engineer, and I see the commit history as a tool. When I want to know what a block of code is for and why it was written the way it was, the commit history (if it is clean and granular) will tell me a lot about that, and will point me to authors, issues, features, and requirements where I can learn more. I don't care about the process of producing the code, I care about the end result. I get enough exposure to the process when I'm writing my own code.

[+] tomphoolery|10 years ago|reply

There's definitely some merit in keeping the original intent of commit messages even when "committing early and often", but at the end of the day a pull request is supposed to tell some kind of story. When you `git blame` or `git log` a particular file, aren't you more interested in the higher-level changes than a single developer's own miniature storyline of how it got that way?

I'll give you another analogy...If I'm composing a song, and you're not particularly trying to learn how to write songs, what is more relevant to you, the end result or the various drafts and early versions that made it to that end result? For the majority of people who aren't interested in learning the mechanics of songwriting, the "journey" is definitely not as interesting as the destination. For nerds like you and I, that's part of the fun! So I try to keep the original intent of my commits, and preserve their messages in a bullet-pointed list format, to show the individual changes that were made in addition to a higher-level overview of the overall change to the project.

TLDR: Project-level changes are not the same as individual changes, and while both should be represented in commit messages, the project-level changes are overwhelmingly more useful in the future. Git is not about code storage, it's about code communication. It's about developers on the same team communicating with both prose and code in tandem.

[+] kazinator|10 years ago|reply

If you rebase your unpublished work on top of some recent changes, that is in no way a "lie". Doing it via a merge is completely uninformative, in fact. Oh, George was working, by complete coincidence, at the same time, on something completely unrelated and that had to be merged against my change. Whoa, inform me more softly there, I can't take the raw cognitive overload!

Multi-parent commits complicate the use of git. When a commit has a single parent, we can pretend that it's a delta: a patch. (Like it fscking should be in a decent version control system based on some sort of patch theory!) When we "show -p" that commit, we get a diff, which is against its one and only parent. Multiple parents also complicate certain situations. They rear their ugly heads and create an ambiguity. For instance, consider git cherry-pick. If a given commit has just one parent, we can pretend that it's a delta and "git cherry-pick" it. If it has multiple parents, the ugly truth is revealed: a git version isn't actually a delta. If you want that change, you need to specify the parent!

The parent of a commit should be the thing that the work was actually based on: the work that the developer took and massaged to create the new baseline representing the commit. When you have multiple parents, only one of the parents actually meets this definition. The others are arbitrary nodes in the system which are just installed as the parents.

[+] dahart|10 years ago|reply

Speaking of trust, I have a really hard time taking seriously the argument that squashing or rebasing is "lying". If you'd stuck to meaningful, I'd consider it seriously. But "lying"? This may be git's biggest flame war, this is an old and tired debate, and calling it "lying" is an over-the-top ham-fisted value judgement of a technical workflow that is a personal choice with legitimate reasons to go either way.

Also, Linus endorses cleaning up your WIP commits before pushing.

[+] wheels|10 years ago|reply

I don't find "seeing how the sausage was made" simply of interest. It's often very instructive.

Often there will be counter-intuitive bits of code that make sense in "git blame" if you see the original, small, commit that they were created as part of. If they're part of a 1000+ line feature bomb, you lose that important context.

[+] chris_wot|10 years ago|reply

This was only recently, but for a while I tried keeping my commits to as small as possible in LibreOffice so that I didn't break the build. Unfortunately this caused major issues in backporting fixes, so I changed my practice.

Whilst my small commits were squashed, it shows that even well formed smaller commits that are part of a larger change can often be problematic.

Keeping your commit history clean is important. When I'm bisecting, I don't want to see coding errors like typos and syntax errors, they literally get in the way of the bisect. And when I'm reading through a source file, I'd like to each commit to be significant, or at least entirely relevant to a change. Minor syntax error changes, whilst they can still sneak into the master repository, should be few and far between.

Basically, it also encourages unit testing, rechecking your code, continuous integration, and a raft of other good and best practices around coding. And your colleagues will thank you.

Merge squashing is actually a pretty decent way around this - I'm going to use it as my workflow now. I'll make frequently code commits on a seperate branch, then squash down into another branch, then push this.

[+] cballard|10 years ago|reply

This is a bad idea masquerading as a good idea. Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to). However, keeping distinct atomic commits that change one and only one small thing, when possible, is much preferable if bisect or blame is used. If you have broken or poorly written commits, use fixup, reword, squash, etc. in rebase -i.

Using fast-forward (and possibly only allowing fast-forward) is a good idea. Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

[+] JoshTriplett|10 years ago|reply

If someone prepares a pull request with a well-structured series of commits, making a logical series of changes, where the project builds and passes tests after each commit, then those commits shouldn't get squashed.

However, I frequently see people adding more commits on top of a pull request to fix typos, or do incremental development, where only the final result builds and passes, but not the intermediate stages, and where the changes are scattered among the commits with no logical grouping. In that case, I'd rather see them squashed and merged than merged in their existing form, and having a button to do that makes it more likely to happen.

[+] michaelmior|10 years ago|reply

I used to feel the same way (and still do to some degree). However I think the issue is more nuanced. I agree that rebasing beforehand is a good idea. But I can see the value in keeping commits on the master branch corresponding to specific features or bug fixes (which presumably map to PRs).

I think the argument can be made that if you don't feel comfortable performing a squashed merge of a PR, then that PR contains too much work and should be split up. However, I don't think there's an easy rule to decide in either case.

[+] BinaryIdiot|10 years ago|reply

> Before making a pull request (or doing any sort of merge), you should rebase against upstream master (or whatever you're going to push to)

See, and maybe this is because I'm just dumb or something, but I have never gotten rebasing to work for me. Ever. Every single time I do it I read at east 3 articles about it so I don't screw something up, I attempt to do it and ultimately I lose a bunch of work.

I just don't get it. I can write web, mobile and desktop apps and I like to think I'm pretty good at it. But I'm one of those people who constantly have commits of merges in their code because for whatever reason I just can't get my head around making rebasing work correctly.

Am I the only one? Sorry for the derail but it's bothering me that I've never gotten this to work correctly and I feel otherwise normally smart. ¯\_(ツ)_/¯

[+] zb|10 years ago|reply

I think that GitHub's pull-request-based model is fundamentally broken. Gerrit's model, where every commit is quasi-independent (and hence must pass tests) and you can easily edit without force-pushing anything or losing review history, is superior (though not perfect) in almost all cases. (Exception: merging a long-running feature branch where all the commits in the branch have already been reviewed.)

This is GitHub's attempt to solve the problem without really changing anything. It won't really change anything. Since pull requests routinely contain a mixture of both changes that should be squashed (fixups) and changes that should not be squashed (independent changes), this just means that you get to pick your poison.

[+] natrius|10 years ago|reply

I used to strongly believe what you do until my company started using Phabricator, which forces the squash workflow on you. It makes your history more useful, not less. The pull request is the appropriate unit of change for software. Make small commits as you develop, then squash them down into a single meaningful change to the behavior of your software.

[+] 3JPLW|10 years ago|reply

Sure, it's a terrible feature to always use. And it's likely to be of little use to contributors who know how to use Git well. But in large open-source projects, often new contributors make a small change that needs a few minor corrections. Eliminating that final back-and-forth ("squash please") is a huge win for maintainers.

[+] haberman|10 years ago|reply

There's no guarantee that every individual commit of a feature branch is meaningful, or even builds. It also makes the history of the master branch a lot harder to read when it has tons of commits representing the minutiae of the feature's development.

[+] falsedan|10 years ago|reply

I use this workflow:

  1. branch off master
  2. work, commit, push, test (on CI server)
  3. decide it's time to ship
  4. rebase -i, push, test (again)
  5. git checkout master && git merge --no-ff feature_branch

(make the merge commit message a summary of the feature)

master ends up being a list of feature branch commits, bookended by the merge commit which introduced the feature. Getting the squash commit diff is as easy as 'git diff feature_branch_merge^..feature_branch_merge'.

[+] phasmantistes|10 years ago|reply

Squashing entire pull requests that change multiple things into a single commit is a bad idea, yes. But uploading and asking for review on such wide-reaching pull requests is a bad idea in the first place.

Using fast-forward without squash is also a bad idea in many cases: the string of commits may contain multiple points that don't actually build or pass tests, even if the final commit in the chain fixes all that. There's no point in landing those broken commits, and doing so will confuse bisection tools.

Fast-forward with squash, and enforcing reasonably sized code reviews as a matter of culture, is the best of all worlds in my opinion.

[+] golergka|10 years ago|reply

Why would you want to rewrite your whole work history and change the actual state of the repository at each of your commits? Why don't just merge?

[+] draw_down|10 years ago|reply

Rebasing seems to clutter the Github PR's commit history and diff with all the commits to master that were made between the time the branch was cut and the time the rebase happens. But it doesn't do that if you merge in master. I never understood this.

[+] jgraham|10 years ago|reply

It's a bad idea because it's a bad implementation. If it allowed you to select what to squash, defaulting to the behaviour of git rebase -i --autosquash master then it would be a clearly good feature.

[+] serge2k|10 years ago|reply

> Squashing entire pull requests that may change multiple things into a single commit is a very bad idea.

If changes are too large/complex/disjoint to fix in a single commit then why have them in one PR?

[+] jibsen|10 years ago|reply

I wonder why they did not add `--ff-only` as an option, like GitLab has.

[+] gjtorikian|10 years ago|reply

I mentioned this in a tweet[0] but we have a quasi-tradition of shipping on April 1st:

* https://github.com/blog/1815-l-is-for-labels

* https://github.com/blog/1451-branch-and-tag-labels-for-commi...

* https://github.com/blog/626-announcing-svn-support

[0]: https://twitter.com/gjtorikian/status/715972348860633088

[+] Pxtl|10 years ago|reply

This is a presentation issue masquerading as a data issue. If somebody suggested deleting data because a report was ugly, they'd be laughed out of the room.

Give us tools to mark commits as unimportant or group them together as a meta-commit object for history purposes.

[+] gonyea|10 years ago|reply

I hate squashing branches. There's a lot of value in commit messages; they're educational and are a form of documentation. It's on the developer to squash the "oops" commits, during a rebase, rewriting the commit message so it has some value when going back in time to look at changes.

I'd love to see a commit linter, that points at commits with text like "oops" and "fix my derp" to suggest possible commits to squash.

Git history shouldn't resemble a hot mess, but the evolution of code should be pretty granular. I'd take the hot mess over full squashing, though.

[+] gus_massa|10 years ago|reply

Perhaps I'm greedy, but I'd like also an additional option to rebase without squashing ...

Also, it's not clear if it is possible to disable the merge button completely. I prefer to use the command line to rebase and fix the details in the commits, but the big green "merge" button is always too tempting and it's easy to press it by mistake.

[+] ianleeclark|10 years ago|reply

This is a feature I've wanted for such a long time. While it's perfectly feasible to do through the command line, I've always found myself having to force push to update the history on github's side of things.

I typically did it through git rebase -i HEAD~N, so maybe someone here on HN knows of a better way to squash a commit whenever you're updating remote history. Albeit, it seems that updating remote history with a squashed commit isn't entirely attractive behavior and that's why I was forced to force push.

[+] Afforess|10 years ago|reply

Nice to see Github catching up to Gitlab[1]

[1]: http://feedback.gitlab.com/forums/176466-deprecated-feedback...

[+] liquidise|10 years ago|reply

Oh god no. I made the mistake of moving a team to squashed commits once. The lack of individual commits poses large problems down the line. 2 nonstarters come to mind:

1: Completely ruin your ability to git bisect any bug injected in your branch. Instead of getting a 10 line commit, bisect will point you to hundreds or thousands of lines instead.

2: All code will blame to a single person. Code with 6 people on a large branch? Want to git blame the code to see who wrote the function that is weird looking so you can ask questions? Too bad.

Do not squash branches on teams. One of the biggest mistakes of my professional career.

[+] chmike|10 years ago|reply

This is awfull because it destroys information.

A feature branch needs cleaning up before publishing. These bug and typo fixes needs to be squashed in the branch, but the history should preserve a sequence of commits corresponding to atomic changes because it tells a story.

A commit should be small enough to make it easy to check if the change is correct. It should be self contained so that it can be moved arround, cherry picked, etc.

This is why the local commit history should be considered as a draft of the story, and the one published should be the official one aimed to be easily readable, verifiable and manipulable.

[+] lotyrin|10 years ago|reply

Any plans to allow for squashing but with a merge instead of a fast-forward or fast-forwards without squashing?

CI will still run against the hypothetical merge commit, no? I wonder if there are edge cases where merge vs squash+fast-forward would result in different conflict resolutions and different trees, so master could end up with a tree that didn't have tests run against it.

[+] looneysquash|10 years ago|reply

I'm confused. When I don't want to see the details of the topic branches, I run 'git log --first-parent'.

Why not make it easier to see what I want to see, but then let me drill down and see more, instead of removing the details all together?

[+] ltbarcly3|10 years ago|reply

No, don't squash your commits. This is stupid advice that comes back over and over.

Squashing commits is a useless thing that has absolutely no benefit. It's dumb. It really makes no sense. It has very clear negatives.

https://news.ycombinator.com/item?id=5631184

[+] abalashov|10 years ago|reply

Call me naive, but wouldn't this problem be best solved by requiring that commits represent meaningful and working increments of work?

I use 'git add -p' judiciously and only commit when having reached a point where something can be usefully said to be in some way "done". Sure, it's not perfect, and occasionally I end up having to do some cleanup of miscellaneous printf statements, debug values or typoes in subsequent commits, but this is something that should really be avoided if possible.

[+] pkamb|10 years ago|reply

I'm waiting for the code review tool that takes a massive squashed PR and cuts it up into a series of clean, atomic commits that can be reviewed individually.

[+] cyphar|10 years ago|reply

Yeah, this is dumb. Being able to effectively bisect large projects depends on having smaller commits. If your commits are all huge, your bisect will hit some 500 line patch that does 5 different things -- it'll easily quadruple the debugging time. Not to mention that having small self-contained commits is a good thing.

[+] golergka|10 years ago|reply

This, as rebase, is a great idea for people who would rather use SVN instead, but have to use git for some reason or another.

I mean, if you want your history simplified into a linear squashed series of commits, why the hell do you use VCS which models a history as acyclic directional graph in the first place?

[+] overgard|10 years ago|reply

Maybe I'm missing something, but I've never understood the point of cleaning up commit history. The only time I ever look at commit history past a few days is if I want to know when something broke (in which case I WANT every part of the history, even the messy bits), or if I'm trying to refresh my memory of what I've done for the year (for performance reviews or whatever) in which case I guess it's marginally useful, but it's pretty easy to skim over "fix build" and such.

[+] dboreham|10 years ago|reply

As this thread approaches 300 posts I'm wondering when we're going to get out of this Git Tarpit we've somehow got ourselves into?

To be charitable: Git seems to be a good tool designed for Problem A, being widely used for Problems B,C,D for which it is a fairly poor choice.

I _think_ we got here through some mix of "but it's really fast!"; "I can do _anything_!!"; "But Linus says its great!"; "I don't need to pay for a beefy server any more!"; "New _must_ be good, right?".

In any event, how do we get out the ditch and back to work making software vs. trying to reason about the unreasonable? I personally, for the kind of projects I'm involved with (small teams, all paid by the same piper, with aligned clear goals and competent coders), had perfectly satisfactory revision control systems since around 2000 (except when required by employer to use Clearcase..). It would be nice to get back to that future.

[+] nullc|10 years ago|reply

Annoying: you can't turn off both. If your project has a workflow where the webui merge should not be used (e.g. using signed merges) there is still no way to achieve that.

339 comments