Squashing only results in a cleaner commit history if you're making a mess of the history on your branches. If you're structuring the commit history on your branches logically, squashing just throws information away.
I’m all ears for a better approach because squashing seems like a good way to preserve only useful information.
My history ends up being:
- add feature x
- linting
- add e2e tests
- formatting
- additional comments for feature
- fix broken test (ci caught this)
- update README for new feature
- linting
With a squash it can boil down to just “added feature x” with smaller changes inside the description.
If my change is small enough that it can be treated as one logical unit, that will be reviewed, merged and (hopefully not) reverted as one unit, all these followup commits will be amends into the original commit. There's nothing wrong with small changes containing just one commit; even if the work wasn't written or committed at one time.
Where logical commits (also called atomic commits) really shine is when you're making multiple logically distinct changes that depend on each other. E.g. "convert subsystem A to use api Y instead of deprecated api X", "remove now-unused api X", "implement feature B in api Y", "expose feature B in subsystem A". Now they can be reviewed independently, and if feature B turns out to need more work, the first commits can be merged independently (or if that's discovered after it's already merged, the last commits can be reverted independently).
If after creating (or pushing) this sequence of commits, I need to fix linting/formatting/CI, I'll put the fixes in a fixup commit for the appropriate and meld them using a rebase. Takes about 30s to do manually, and can be automated using tools like git-absorb. However, in reality I don't need to do this often: the breakdown of bigger tasks into logical chunks is something I already do, as it helps me to stay focused, and I add tests and run linting/formatting/etc before I commit.
And yes, more or less the same result can be achieved by creating multiple MRs and using squashing; but usually that's a much worse experience.
You can always take advantage of the graph structure itself. With `--first-parent` git log just shows your integration points (top level merge commits, PR merges with `--no-ff`) like `Added feature X`. `--first-parent` applies to blame, bisect, and other commands as well. When you "need" or most want linear history you have `--first-parent` and when you need the details "inside" a previous integration you can still get to them. You can preserve all information and yet focus only on the top-level information by default.
It's just too bad not enough graphical UIs default to `--first-parent` and a drill-down like approach over cluttered "subway graphs".
stacked diffs are the best approach and working at a company that uses them and reading about the "pull request" workflow that everyone else subjects themselves to makes me wonder why everyone is not using stacked diffs instead of repeating this "squash vs. not squash" debate eternally.
every commit is reviewed individually. every commit must have a meaningful message, no "wip fix whatever" nonsense. every commit must pass CI. every commit is pushed to master in order.
Not everyone develops and commits the same way and mandating squashing is a much simpler management task than training up everyone to commit in a similar manner.
Besides, they probably shouldn't make PR commits atomic, but do so as often as needed. It's a good way to avoid losing work. This is in tension with leaving behind clean commits, and squashing resolves it.
At work there was only one way to test a feature, and that was to deploy it to our dev environment. The only way to deploy to dev was to check the repo into a branch, and deploy from that branch.
So one branch had 40x "Deploy to Dev" commits. And those got merged straight into the repo.
Good luck getting 100+ devs to all use the same logical commit style. And if tests fail in CI you get the inevitable "fix tests" commit in the branch, which now spams your main branch more than the meaningful changes. You could rebase the history by hand, but what's the point? You'd have to force push anyway. Squashing is the only practical method of clean history for large orgs.
Also rebasing is just so fraught with potential errors - every month or two, the devs who were rebasing would screw up some feature branch that they had work on they needed and would look to me to fix it for some reason. Such a time sink for so little benefit.
I eventually banned rebasing, force pushes, and mandated squash merges to main - and we magically stopped having any of these problems.
True but. There's a huge trade-off in time management.
I can spend hours OCDing over my git branch commit history.
-or-
I can spend those hours getting actual work done and squash at the end to clean up the disaster of commits I made along the way so I could easily roll back when needed.
literallyroy|2 months ago
My history ends up being: - add feature x - linting - add e2e tests - formatting - additional comments for feature - fix broken test (ci caught this) - update README for new feature - linting
With a squash it can boil down to just “added feature x” with smaller changes inside the description.
Denvercoder9|2 months ago
Where logical commits (also called atomic commits) really shine is when you're making multiple logically distinct changes that depend on each other. E.g. "convert subsystem A to use api Y instead of deprecated api X", "remove now-unused api X", "implement feature B in api Y", "expose feature B in subsystem A". Now they can be reviewed independently, and if feature B turns out to need more work, the first commits can be merged independently (or if that's discovered after it's already merged, the last commits can be reverted independently).
If after creating (or pushing) this sequence of commits, I need to fix linting/formatting/CI, I'll put the fixes in a fixup commit for the appropriate and meld them using a rebase. Takes about 30s to do manually, and can be automated using tools like git-absorb. However, in reality I don't need to do this often: the breakdown of bigger tasks into logical chunks is something I already do, as it helps me to stay focused, and I add tests and run linting/formatting/etc before I commit.
And yes, more or less the same result can be achieved by creating multiple MRs and using squashing; but usually that's a much worse experience.
WorldMaker|2 months ago
It's just too bad not enough graphical UIs default to `--first-parent` and a drill-down like approach over cluttered "subway graphs".
mh2266|2 months ago
every commit is reviewed individually. every commit must have a meaningful message, no "wip fix whatever" nonsense. every commit must pass CI. every commit is pushed to master in order.
TheGRS|2 months ago
esafak|2 months ago
sallveburrpi|2 months ago
Other than that pretty free how you write commit messages
bb88|2 months ago
So one branch had 40x "Deploy to Dev" commits. And those got merged straight into the repo.
They added no information.
eddd-ddde|2 months ago
No information loss, and every commit is valid on their own, so cherry picks maintain the same level of quality.
trevor-e|2 months ago
mattbillenstein|2 months ago
Also rebasing is just so fraught with potential errors - every month or two, the devs who were rebasing would screw up some feature branch that they had work on they needed and would look to me to fix it for some reason. Such a time sink for so little benefit.
I eventually banned rebasing, force pushes, and mandated squash merges to main - and we magically stopped having any of these problems.
Denvercoder9|2 months ago
The Linux kernel manages to do it for 1000+ devs.
mmh0000|2 months ago
I can spend hours OCDing over my git branch commit history.
-or-
I can spend those hours getting actual work done and squash at the end to clean up the disaster of commits I made along the way so I could easily roll back when needed.
tedmiston|2 months ago