Github-style rebase-only PRs have revealed the best compromise between 'preserve history' and 'linear history' strategies:
All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.
If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.
The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).
What's weird about most of these discussions is how they're always seen as technical considerations distinct from the individuals who actually use the system.
The kernel needs a highly-distributed workflow because it's a huge organization of loosely-coupled sub-organizations. Most commercial software is developed by a relatively small group of highly-cohesive individuals. The forces that make a solution work well in one environment don't necessarily apply elsewhere.
With this, you can also push people towards smaller PRs which are easier to review and integrate.
The downside is that if you és o work on feature 2 based on feature 1,either you wait for the PR to be merged in main (easiest approach) or you fork from your feature branch directly and will need to rebase later (this can get messier, especially if you need to fix errors in feature 1).
I want the 'merge' function completely deprecated. I simply don't trust it anymore.
If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.
If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
I don't know how stupid this is on a scale from 1 to 10. I've created a wrapper [1] for git (called "shit", for "short git") that converts non-padded revisions to their padded counterpart.
Examples:
"shit show 14" gets converted to "git show 00000140"
"shit log 10..14" translates to "git log 00000100..00000140"
Mercurial always has had sequential revision numbers in addition to hashes for every commit.
They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.
But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.
Seems like a design fault in git that commits only have a single id (sha1 hash) and that hashes are written without any prefix indicating which type of id it is.
If all hashes were prefixed with "h", it would have been so simple to add another (secure) hash and a serial number.
E.g. h123456 for the sha1, k6543 for sha256 and n100 for the commit number.
I fail to see the point of this, in fact, I think this is a fundamentally flawed approach to dealing with your revision history. The problem is that rebasing commits has the potential of screwing up the integrity of your commit history.
How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.
The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
> The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
To me this is like saying to a construction worker: “The problem is not that your hammer has sharp spikes coming out of the handle at every angle. The problem is that you don’t put on a chain mail glove when using it.” That’s certainly one way to look at it.
This somewhat depends on how big your features are. Arguably, large long-lived feature branches are the problem themselves. If larger features are broken down and developed/merged piecemeal, then you still have smaller commits you can fall back on.
IIRC, GitHub uses a development model where partially implemented features are actually deployed to production, but hidden behind feature flags.
I'm pretty sure the point is that this is a one-person project and the author can play around. He's not suggesting your team of 100 people to adopt this for the development of your commercial product.
I think the fundamental misunderstanding people with your point of view have regarding linear commit histories is that it's not just about different VCS usage, the entire development process is changed.
When you are using linear histories and rebasing you don't do monolithic feature branches. You land smaller chunks and gate their functionality via some configuration variable. `if (useNewPath) { newPath(); } else { oldPath(); }` and all your new incremental features land in `newPath`. All tests pass on both code paths and nothing breaks. When the feature is fully done then you change the default configuration to move to the `newPath`.
> How are you going to deal with non-trivial feature branches that need to be integrated into master?
That's the point -- this isn't a thing in rebase workflows. That's a feature. You don't have to deal with megapatches for massive features. It's incrementally verified along the way and bisect works flawlessly.
It is amazing how much time projects seem to spend on rewriting history for the goal of displaying in in a pretty way. Leaving history intact and having better ways to display it seems far saner. Even after a merge, history in the branch maybe useful for bisect, etc.
Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.
This isn't the end of the world, but something to consider.
I think the sweet spot in Developer productivity was when we had SVN repos and used git-svn on the client. Commits were all rebased on git level prior to pushing. If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.
We performed code review with a projector in our office jointly looking at diffs, or emacs.
Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.
> If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.
We did have an ugly plush animal, but it served more obscure purposes. For blame of broken builds, we had an info screen that counted the number of times a build had passed, and displayed below the name of the person who last broke it.
Explaining to outsiders and non-developers that "Yes, when you make a mistake in this department, we put the person's name on the wall and don't take it down until someone else makes a mistake" sounds so toxic. But it strangely enough wasn't so harsh. Of course there was some stigma that you'd want to avoid, but not to a degree of feeling prolonged shame.
I have my old team's rubber chicken and I'm never giving it up.
In-person code review is the only way to do it. Pull requests optimize for the wrong part of code review, so now everyone thinks it's supposed to be a quality gate.
It has been my habit for a while to make the root commit 0000000 because it’s fun, but for some reason it had not occurred to me to generalise this to subsequent commits. Tempting, very tempting. I have a couple of solo-developed-and-publicly-shared projects in mind that I will probably do this for.
I bet I wasn't the first person who thought this would have to be done by modifying actual file content — e.g. a dummy comment or something. That would clearly have been horrible, but the fact that git bases the checksum off the commit message is... surprising and fortunate, in this case!
It's a hash of everything that goes into a commit, including the commit message. The idea is that nothing that makes up a commit can change without changing the hash.
I find tags to be a fairly useful way of providing a linear progression, but I guess that's no fun.
> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.
You could automatically tag each uploaded commit with a number drawn from a sequence - using a git post-update hook. The only problem is that this centralizes the process. It's not possible to have fully "blessed" commits without pushing them first. And that's how SVN works, too.
“
So we only have one option: testing many combinations of junk data until we can find one that passes our criteria.
“
I have a somewhat related interest of trying to find sentences that have low Sha256 sums.
I made a go client that searches for low hash sentences and uploads winners to a scoreboard I put up at https://lowhash.com
I am not knowledgeable about gpu methods or crypto mining in general, I just tried to optimize a cpu based method. Someone who knows what they are doing could quickly beat out all the sentences there.
The article talks about eight-character prefixes later in the article, but Git short refs actually use seven-character prefixes when there is no collision on that (and that’s what’s shown earlier in the article). So you can divide time by 16.
For me on a Ryzen 5800HS laptop, lucky_commit generally takes 11–12 seconds. I’m fine with spending that much per commit when publishing. The three minutes eight-character prefixes would require, not quite so much.
[+] [-] infogulch|3 years ago|reply
All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.
If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.
The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).
[+] [-] couchand|3 years ago|reply
The kernel needs a highly-distributed workflow because it's a huge organization of loosely-coupled sub-organizations. Most commercial software is developed by a relatively small group of highly-cohesive individuals. The forces that make a solution work well in one environment don't necessarily apply elsewhere.
[+] [-] pnt12|3 years ago|reply
With this, you can also push people towards smaller PRs which are easier to review and integrate.
The downside is that if you és o work on feature 2 based on feature 1,either you wait for the PR to be merged in main (easiest approach) or you fork from your feature branch directly and will need to rebase later (this can get messier, especially if you need to fix errors in feature 1).
[+] [-] palata|3 years ago|reply
[+] [-] smcameron|3 years ago|reply
[+] [-] juped|3 years ago|reply
[+] [-] larschdk|3 years ago|reply
If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.
If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.
[+] [-] zegl|3 years ago|reply
Examples:
"shit show 14" gets converted to "git show 00000140"
"shit log 10..14" translates to "git log 00000100..00000140"
[1]: https://github.com/zegl/extremely-linear/blob/main/shit
[+] [-] informalo|3 years ago|reply
[1]: https://github.com/nvbn/thefuck
[+] [-] anderskaseorg|3 years ago|reply
https://git-scm.com/docs/git-describe
[+] [-] thih9|3 years ago|reply
Shouldn't "shit show 14" get converted to "git show 0000014"?
[+] [-] jordigh|3 years ago|reply
They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.
But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.
[+] [-] silvestrov|3 years ago|reply
If all hashes were prefixed with "h", it would have been so simple to add another (secure) hash and a serial number.
E.g. h123456 for the sha1, k6543 for sha256 and n100 for the commit number.
[+] [-] kinduff|3 years ago|reply
I wonder about performance, though. Why is the author's method slower than the package I linked?
[0]: https://github.com/not-an-aardvark/lucky-commit
[+] [-] blux|3 years ago|reply
How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.
The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.
[+] [-] boxed|3 years ago|reply
[+] [-] thewebcount|3 years ago|reply
To me this is like saying to a construction worker: “The problem is not that your hammer has sharp spikes coming out of the handle at every angle. The problem is that you don’t put on a chain mail glove when using it.” That’s certainly one way to look at it.
[+] [-] michaelmior|3 years ago|reply
IIRC, GitHub uses a development model where partially implemented features are actually deployed to production, but hidden behind feature flags.
[+] [-] tasuki|3 years ago|reply
I'm pretty sure the point is that this is a one-person project and the author can play around. He's not suggesting your team of 100 people to adopt this for the development of your commercial product.
[+] [-] lanza|3 years ago|reply
When you are using linear histories and rebasing you don't do monolithic feature branches. You land smaller chunks and gate their functionality via some configuration variable. `if (useNewPath) { newPath(); } else { oldPath(); }` and all your new incremental features land in `newPath`. All tests pass on both code paths and nothing breaks. When the feature is fully done then you change the default configuration to move to the `newPath`.
> How are you going to deal with non-trivial feature branches that need to be integrated into master?
That's the point -- this isn't a thing in rebase workflows. That's a feature. You don't have to deal with megapatches for massive features. It's incrementally verified along the way and bisect works flawlessly.
[+] [-] diekhans|3 years ago|reply
[+] [-] tcoff91|3 years ago|reply
[+] [-] bloppe|3 years ago|reply
https://git-scm.com/docs/git-gc#_configuration
Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.
This isn't the end of the world, but something to consider.
[+] [-] chrismorgan|3 years ago|reply
[+] [-] wirrbel|3 years ago|reply
We performed code review with a projector in our office jointly looking at diffs, or emacs.
Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.
[+] [-] sshine|3 years ago|reply
We did have an ugly plush animal, but it served more obscure purposes. For blame of broken builds, we had an info screen that counted the number of times a build had passed, and displayed below the name of the person who last broke it.
Explaining to outsiders and non-developers that "Yes, when you make a mistake in this department, we put the person's name on the wall and don't take it down until someone else makes a mistake" sounds so toxic. But it strangely enough wasn't so harsh. Of course there was some stigma that you'd want to avoid, but not to a degree of feeling prolonged shame.
[+] [-] couchand|3 years ago|reply
In-person code review is the only way to do it. Pull requests optimize for the wrong part of code review, so now everyone thinks it's supposed to be a quality gate.
[+] [-] titzer|3 years ago|reply
[+] [-] tomtom1337|3 years ago|reply
[+] [-] unnah|3 years ago|reply
[+] [-] flir|3 years ago|reply
[+] [-] newswasboring|3 years ago|reply
[+] [-] chrismorgan|3 years ago|reply
[+] [-] oneeyedpigeon|3 years ago|reply
[+] [-] entropy_|3 years ago|reply
[+] [-] belinder|3 years ago|reply
[+] [-] Ayesh|3 years ago|reply
I imagine stuff like this and SVN to Git mirroring to work nicely with identical hashes.
[+] [-] Semaphor|3 years ago|reply
Hah :D
[+] [-] otikik|3 years ago|reply
[+] [-] ChrisMarshallNY|3 years ago|reply
> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.
[0] https://www.perforce.com/video-tutorials/vcs/mainline-model-...
[+] [-] davide_v|3 years ago|reply
[+] [-] Thev00d00|3 years ago|reply
[+] [-] gyulai|3 years ago|reply
[+] [-] jstimpfle|3 years ago|reply
[+] [-] MAGZine|3 years ago|reply
[+] [-] maxbond|3 years ago|reply
[+] [-] sagebird|3 years ago|reply
I have a somewhat related interest of trying to find sentences that have low Sha256 sums.
I made a go client that searches for low hash sentences and uploads winners to a scoreboard I put up at https://lowhash.com
I am not knowledgeable about gpu methods or crypto mining in general, I just tried to optimize a cpu based method. Someone who knows what they are doing could quickly beat out all the sentences there.
[+] [-] HextenAndy|3 years ago|reply
[+] [-] breck|3 years ago|reply
[+] [-] chrismorgan|3 years ago|reply
For me on a Ryzen 5800HS laptop, lucky_commit generally takes 11–12 seconds. I’m fine with spending that much per commit when publishing. The three minutes eight-character prefixes would require, not quite so much.