Extremely Linear Git History

[+] infogulch|3 years ago|reply

Github-style rebase-only PRs have revealed the best compromise between 'preserve history' and 'linear history' strategies:

All PRs are rebased and merged in a linear history of merge commits that reference the PR#. If you intentionally crafted a logical series of commits, merge them as a series (ideally you've tested each commit independently), otherwise squash.

If you want more detail about the development of the PR than the merge commit, aka the 'real history', then open up the PR and browse through Updates, which include commits that were force-pushed to the branch and also fast-forward commits that were appended to the branch. You also get discussion context and intermediate build statuses etc. To represent this convention within native git, maybe tag each Update with pr/123/update-N.

The funny thing about this design is that it's actually more similar to the kernel development workflow (emailing crafted patches around until they are accepted) than BOTH of the typical hard-line stances taken by most people with a strong opinion about how to maintain git history (only merge/only rebase).

[+] couchand|3 years ago|reply

What's weird about most of these discussions is how they're always seen as technical considerations distinct from the individuals who actually use the system.

The kernel needs a highly-distributed workflow because it's a huge organization of loosely-coupled sub-organizations. Most commercial software is developed by a relatively small group of highly-cohesive individuals. The forces that make a solution work well in one environment don't necessarily apply elsewhere.

[+] pnt12|3 years ago|reply

I wholeheartedly agree!

With this, you can also push people towards smaller PRs which are easier to review and integrate.

The downside is that if you és o work on feature 2 based on feature 1,either you wait for the PR to be merged in main (easiest approach) or you fork from your feature branch directly and will need to rebase later (this can get messier, especially if you need to fix errors in feature 1).

[+] palata|3 years ago|reply

What about commit signatures? If you rebase, you lose the original signature, don't you?

[+] smcameron|3 years ago|reply

Did you even read the article? This article is about perversely forcing the commit hashes to come out a certain way for lulz.

[+] juped|3 years ago|reply

But why do you "squash" it! Why do people do this?

[+] larschdk|3 years ago|reply

I want the 'merge' function completely deprecated. I simply don't trust it anymore.

If there are no conflicts, you might as well rebase or cherry-pick. If there is any kind of conflict, you are making code changes in the merge commit itself to resolve it. Developer end up fixing additional issues in the merge commit instead of actual commits.

If you use merge to sync two branches continously, you completely lose track of what changes were done on the branch and which where done on the mainline.

[+] zegl|3 years ago|reply

I don't know how stupid this is on a scale from 1 to 10. I've created a wrapper [1] for git (called "shit", for "short git") that converts non-padded revisions to their padded counterpart.

Examples:

"shit show 14" gets converted to "git show 00000140"

"shit log 10..14" translates to "git log 00000100..00000140"

[1]: https://github.com/zegl/extremely-linear/blob/main/shit

[+] informalo|3 years ago|reply

Other customers also brew-installed: fuck [1]

[1]: https://github.com/nvbn/thefuck

[+] anderskaseorg|3 years ago|reply

You may want to take a look at the monotonic commit numbering scheme that Git already has, before trying to hack one into the hashes:

https://git-scm.com/docs/git-describe

[+] thih9|3 years ago|reply

Why the trailing zero? The article quotes hashes starting with "0000001", or "0000014".

Shouldn't "shit show 14" get converted to "git show 0000014"?

[+] jordigh|3 years ago|reply

Mercurial always has had sequential revision numbers in addition to hashes for every commit.

They aren't perfect, of course. All they indicate is in which order the current clone of the repo saw the commits. So two clones could pull the commits in different order and each clone could have different revision numbers for the same commits.

But they're still so fantastically useful. Even with their imperfections, you know that commit 500 cannot be a parent of commit 499. When looking at blame logs (annotate logs), you can be pretty sure that commit 200 happened some years before commit 40520. Plus, if you repo isn't big (and most repos on Github are not that big by numbers of commits), your revision numbers are smaller than even short git hashes, so they're easier to type in the CLI.

[+] silvestrov|3 years ago|reply

Seems like a design fault in git that commits only have a single id (sha1 hash) and that hashes are written without any prefix indicating which type of id it is.

If all hashes were prefixed with "h", it would have been so simple to add another (secure) hash and a serial number.

E.g. h123456 for the sha1, k6543 for sha256 and n100 for the commit number.

[+] kinduff|3 years ago|reply

See also Lucky Commit [0], which uses various types of whitespace characters instead of a hash inside the commit, which makes it look more magical.

I wonder about performance, though. Why is the author's method slower than the package I linked?

[0]: https://github.com/not-an-aardvark/lucky-commit

[+] blux|3 years ago|reply

I fail to see the point of this, in fact, I think this is a fundamentally flawed approach to dealing with your revision history. The problem is that rebasing commits has the potential of screwing up the integrity of your commit history.

How are you going to deal with non-trivial feature branches that need to be integrated into master? Squash them and commit? Good luck when you need to git bisect an issue. Or rebase and potentially screwing up the integrity of the unit test results in the rebased branch? Both sound unappealing to me.

The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.

[+] boxed|3 years ago|reply

It's a joke. The swooshing sound you heard was it going past you.

[+] thewebcount|3 years ago|reply

> The problem is not a history with a lot of branches in it, it is in not knowing how to use your tools to present a view on that history you are interested in and is easy for you to understand.

To me this is like saying to a construction worker: “The problem is not that your hammer has sharp spikes coming out of the handle at every angle. The problem is that you don’t put on a chain mail glove when using it.” That’s certainly one way to look at it.

[+] michaelmior|3 years ago|reply

This somewhat depends on how big your features are. Arguably, large long-lived feature branches are the problem themselves. If larger features are broken down and developed/merged piecemeal, then you still have smaller commits you can fall back on.

IIRC, GitHub uses a development model where partially implemented features are actually deployed to production, but hidden behind feature flags.

[+] tasuki|3 years ago|reply

> I fail to see the point of this

I'm pretty sure the point is that this is a one-person project and the author can play around. He's not suggesting your team of 100 people to adopt this for the development of your commercial product.

[+] lanza|3 years ago|reply

I think the fundamental misunderstanding people with your point of view have regarding linear commit histories is that it's not just about different VCS usage, the entire development process is changed.

When you are using linear histories and rebasing you don't do monolithic feature branches. You land smaller chunks and gate their functionality via some configuration variable. `if (useNewPath) { newPath(); } else { oldPath(); }` and all your new incremental features land in `newPath`. All tests pass on both code paths and nothing breaks. When the feature is fully done then you change the default configuration to move to the `newPath`.

> How are you going to deal with non-trivial feature branches that need to be integrated into master?

That's the point -- this isn't a thing in rebase workflows. That's a feature. You don't have to deal with megapatches for massive features. It's incrementally verified along the way and bisect works flawlessly.

[+] diekhans|3 years ago|reply

It is amazing how much time projects seem to spend on rewriting history for the goal of displaying in in a pretty way. Leaving history intact and having better ways to display it seems far saner. Even after a merge, history in the branch maybe useful for bisect, etc.

[+] tcoff91|3 years ago|reply

If people knew about --first-parent everyone could stop complaining about merge commits in the history.

[+] bloppe|3 years ago|reply

This is a fun idea, but it will mess with your GC heuristics.

https://git-scm.com/docs/git-gc#_configuration

Git does something called "packing" when it detects "approximately more than <X (configurable)> loose objects" in your .git/objects/ folder. The key word here is "approximately". It will guess how many total objects you have by looking in a few folders and assuming that the objects are uniformly distributed among them (these folders consist of the first 2 characters of the SHA-1 digest). If you have a bunch of commits in the .git/objects/00/ folder, as would happen here, git will drastically over- or under-approximate the total number of objects depending on whether that 00/ folder is included in the heuristic.

This isn't the end of the world, but something to consider.

[+] chrismorgan|3 years ago|reply

Could use little-endian numbers to avoid this: 0000, 1000, 2000, 3000, …, e000, f000, 0100, …

[+] wirrbel|3 years ago|reply

I think the sweet spot in Developer productivity was when we had SVN repos and used git-svn on the client. Commits were all rebased on git level prior to pushing. If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.

We performed code review with a projector in our office jointly looking at diffs, or emacs.

Of course it’s neat to have GitHub actions now and pull-requests for asynchronous code review. But I learned so much from my colleagues directly in that nowadays obscure working mode which I am still grateful for.

[+] sshine|3 years ago|reply

> If you committed something that broke unit tests your colleagues would pass you a really ugly plush animal of shame that would sit on your desk until the next coworker broke the build.

We did have an ugly plush animal, but it served more obscure purposes. For blame of broken builds, we had an info screen that counted the number of times a build had passed, and displayed below the name of the person who last broke it.

Explaining to outsiders and non-developers that "Yes, when you make a mistake in this department, we put the person's name on the wall and don't take it down until someone else makes a mistake" sounds so toxic. But it strangely enough wasn't so harsh. Of course there was some stigma that you'd want to avoid, but not to a degree of feeling prolonged shame.

[+] couchand|3 years ago|reply

I have my old team's rubber chicken and I'm never giving it up.

In-person code review is the only way to do it. Pull requests optimize for the wrong part of code review, so now everyone thinks it's supposed to be a quality gate.

[+] titzer|3 years ago|reply

Commit queues are so far superior to shaming broken builds that I think it's only nostalgia that makes you miss it.

[+] tomtom1337|3 years ago|reply

This (plush toy and projector) has “feel good” all over it :)

[+] unnah|3 years ago|reply

Next step: a svn-git proxy that allows one to use a subversion client with a remote git repository.

[+] flir|3 years ago|reply

Two years from peak Covid, and the plushies are the object of nostalgia.

[+] newswasboring|3 years ago|reply

I am literally in the middle of trying to convince my group from moving away from all this. Would you recommend going back to this system?

[+] chrismorgan|3 years ago|reply

It has been my habit for a while to make the root commit 0000000 because it’s fun, but for some reason it had not occurred to me to generalise this to subsequent commits. Tempting, very tempting. I have a couple of solo-developed-and-publicly-shared projects in mind that I will probably do this for.

[+] oneeyedpigeon|3 years ago|reply

I bet I wasn't the first person who thought this would have to be done by modifying actual file content — e.g. a dummy comment or something. That would clearly have been horrible, but the fact that git bases the checksum off the commit message is... surprising and fortunate, in this case!

[+] entropy_|3 years ago|reply

It's a hash of everything that goes into a commit, including the commit message. The idea is that nothing that makes up a commit can change without changing the hash.

[+] belinder|3 years ago|reply

I feel like it would be better to have some dummy file in your repo that the tool modifies than mucking up your commit messages

[+] Ayesh|3 years ago|reply

I wonder if Git provides a pluggable hashing mechanism as part of SHA2 migration.

I imagine stuff like this and SVN to Git mirroring to work nicely with identical hashes.

[+] Semaphor|3 years ago|reply

> Full collision (entire hash is zeros, then 000...1, etc.) — `git linearize --format "%040d"` (takes ~10³³ years to run per commit)

Hah :D

[+] otikik|3 years ago|reply

This is horrible and I like it.

[+] ChrisMarshallNY|3 years ago|reply

I find tags to be a fairly useful way of providing a linear progression, but I guess that's no fun.

> but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.

That sounds like the Mainline Model, championed by Perforce[0]. It's actually fairly sensible.

[0] https://www.perforce.com/video-tutorials/vcs/mainline-model-...

[+] davide_v|3 years ago|reply

I thought I was a very tidy person, then I saw this.

[+] Thev00d00|3 years ago|reply

Im not sure it is tidy to inject random junk into your commit message to get a hash prefix

[+] gyulai|3 years ago|reply

Sane revision numbers are among the many reasons I prefer SVN to GIT.

[+] jstimpfle|3 years ago|reply

You could automatically tag each uploaded commit with a number drawn from a sequence - using a git post-update hook. The only problem is that this centralizes the process. It's not possible to have fully "blessed" commits without pushing them first. And that's how SVN works, too.

[+] MAGZine|3 years ago|reply

... What are the other reasons?

[+] maxbond|3 years ago|reply

Has anyone tried using git alternatives like fossil in production? Did it work out? Did you build CI/CD around it?

[+] sagebird|3 years ago|reply

“ So we only have one option: testing many combinations of junk data until we can find one that passes our criteria. “

I have a somewhat related interest of trying to find sentences that have low Sha256 sums.

I made a go client that searches for low hash sentences and uploads winners to a scoreboard I put up at https://lowhash.com

I am not knowledgeable about gpu methods or crypto mining in general, I just tried to optimize a cpu based method. Someone who knows what they are doing could quickly beat out all the sentences there.

[+] HextenAndy|3 years ago|reply

Wait until you see subversion :)

[+] breck|3 years ago|reply

Two steps forward one step back. So it goes.

[+] chrismorgan|3 years ago|reply

The article talks about eight-character prefixes later in the article, but Git short refs actually use seven-character prefixes when there is no collision on that (and that’s what’s shown earlier in the article). So you can divide time by 16.

For me on a Ryzen 5800HS laptop, lucky_commit generally takes 11–12 seconds. I’m fine with spending that much per commit when publishing. The three minutes eight-character prefixes would require, not quite so much.

358 comments