There’s something that bothers me about these sorts of recollections that make git seem… inevitable.
There’s this whole creation myth of how Git came to be that kind of paints Linus as some prophet reading from golden tablets written by the CS gods themselves.
Granted, this particular narrative in the blog post does humanise a bit more, remembering the stumbling steps, how Linus never intended for git itself to be the UI, how there wasn’t even a git commit command in the beginning, but it still paints the whole thing in somewhat romantic tones, as if the blob-tree-commit-ref data structure were the perfect representation of data.
One particular aspect that often gets left out of this creation myth, especially by the author of Github is that Mercurial had a prominent role. It was created by Olivia Mackall, another kernel hacker, at the same time as git, for the same purpose as git. Olivia offered Mercurial to Linus, but Linus didn’t look upon favour with it, and stuck to his guns. Unlike git, Mercurial had a UI at the very start. Its UI was very similar to Subversion, which at the time was the dominant VCS, so Mercurial always aimed for familiarity without sacrificing user flexibility. In the beginning, both VCSes had mind share, and even today, the mindshare of Mercurial lives on in hg itself as well as in worthy git successors such as jujutsu.
And the git data structure isn’t the only thing that could have ever possibly worked. It falls apart for large files. There are workaround and things you can patch on top, but there are also completely different data structures that would be appropriate for larger bits of data.
Git isn’t just plain wonderful, and in my view, it’s not inevitable either. I still look forward to a world beyond git, whether jujutsu or whatever else may come.
I'm curious why you think hg had a prominent role in this. I mean, it did pop up at almost exactly the same time for exactly the same reasons (BK, kernel drama) but I don't see evidence of Matt's benchmarks or development affecting the Git design decisions at all.
Here's one of the first threads where Matt (Olivia) introduces the project and benchmarks, but it seems like the list finds it unremarkable enough comparatively to not dig into it much:
I agree that the UI is generally better and some decisions where arguably better (changeset evolution, which came much later, is pretty amazing) but I have a hard time agreeing that hg influenced Git in some fundamental way.
A lot of the ideas around git were known at this time. People mentioned monotone already. Still, Linus got the initial design wrong by computing the hash of the compressed content (which is a performance issue and also would make it difficult to replace the compression algorithm). Something I had pointed out early [1] and he later changed it.
I think the reason git then was successful was because it is a small, practical, and very efficient no-nonsense tool written in C. This made it much more appealing to many than the alternatives written in C++ or Python.
I do think an open source, distributed, content addressable VCS was inevitable. Not git itself, but something with similar features/workflows.
Nobody was really happy with the VCS situation in 2005. Most people were still using CVS, or something commercial. SVN did exist, it had only just reached version 1.0 in 2004, but your platforms like SourceForge still only offered CVS hosting. SVN was considered to be a more refined CVS, but it wasn't that much better and still shared all the same fundamental flaws from its centralised nature.
On the other hand, "distributed" was a hot new buzzword in 2005. The recent success of Bittorrent (especially its hot new DHT feature) and other file sharing platforms had pushed the concept mainstream.
Even if it wasn't for the Bitkeeper incident, I do think we would have seen something pop up by 2008 at the latest. It might not have caught on as fast as git did, but you must remember the thing that shot git to popularity was GitHub, not the linux kernel.
> There’s this whole creation myth of how Git came to be that kind of paints Linus as some prophet reading from golden tablets written by the CS gods themselves.
Linus absolutely had a couple of brilliant insights:
1. Content-addressable storage for the source tree.
At that time, I was using SVN and experimenting with Hg and Bazaar. Both were too "magical" for me, with unclear rules for merging, branching, rebasing.
Then came git. I read its description "source code trees, identified by their hashes, with file content movement deduced from diffs", and it immediately clicked. It's such an easy mental model, and you can immediately understand what operations mean.
Another alternative is the patch-theory approach from Darcs and now Pijul. It's a fundamentally different way of thinking about version control—I haven't actually used it myself but, from reading about it, I find thinking in patches matches my natural intuition better than git's model. Darcs had some engineering limitations that could lead to really bad performance in certain cases, but I understand Pijul fixes that.
The article is written by a co-founder of github and not Linus Torvalds.
git is just a tool to do stuff. It's name (chosen by that Finnish bloke) is remarkably apt - its for gits!
It's not Mecurial, nor github, nor is it anything else. Its git.
It wasn't invented for you or you or even you. It was a hack to do a job: sort out control of the Linux kernel source when Bit Keeper went off the rails as far as the Linux kernel devs were concerned.
> there are also completely different data structures that would be appropriate for larger bits of data.
Can you talk a little bit about this? My assumption was that the only way to deal with large files properly was to go back to centralised VCS, I'd be interested to hear what different data structures could obviate the issue.
In early 2000s I was researching VCSs for work and also helping a little developing arch, bazaar then (less so) bzr. I trialed Bitkeeper for work. We went with Subversion eventually. I think I tried Monotone but it was glacially slow. I looked at Mercurial. It didn't click.
When I first used Git I thought YES! This is it. This is the one. The model was so compelling, the speed phenomenal.
I never again used anything else unless forced -- typically Subversion, mostly for inertia reasons.
> There’s this whole creation myth of how Git came to be that kind of paints Linus as some prophet reading from golden tablets written by the CS gods themselves.
What?
> Git isn’t just plain wonderful, and in my view, it’s not inevitable either.
I mean, the proof is in the pudding. So why did we end up with Git? Was it just dumb luck? Maybe. But I was there at the start for both Git and Mercurial (as I comment elsewhere in this post). I used them both equally at first, and as a Python aficionado should've gravitated to Mercurial.
But I like to understand how tools work, and I personally found Mercurial harder to understand, slower to use, and much less flexible. It was great for certain workflows, but if those workflows didn't match what you wanted to do, it was rigid (I can't really expound on this; it's been more than a decade). Surprisingly (as I was coding almost entirely in Python at the time), I also found it harder to contribute to than Git.
Now, I'm just one random guy, but here we are, with the not plain wonderful stupid (but extremely fast) directory content manager.
>And the git data structure... falls apart for large files.
I'm good with this. In my over 25 years of professional experience, having used cvs, svn, perforce, and git, it's almost always a mistake keeping non-source files in the VCS. Digital assets and giant data files are nearly always better off being served from artifact repositories or CDN systems (including in-house flavors of these). I've worked at EA Sports and Rockstar Games and the number of times dev teams went backwards in versions with digital assets can be counted on the fingers of a single hand.
I just wish they'd extend git to have better binary file diffs and moved file tracking.
Remembering the real history matters, because preserving history is valuable by itself, but I'm also really glad that VCS is for most people completely solved, there's nothing besides Git you have to pay attention to, you learn it once and use it your whole career.
I was always under the impression Monotone - which was released two years before Mercurial - was the inspiration for git, and that this was pretty well known.
The file-tree-snapshot-ref structure is pretty good, but it lacks chunking at the file and tree layers, which makes it inefficient with large files and trees that don't change a lot. Modern backup tools like restic/borg/etc use something similar, but with chunking included.
Around 2002 or so, I had an idea to tag every part of a project with a unique hash code. With a hash code, one could download the corresponding file. A hash code for the whole project would be a file containing a list of hash codes for the files that make up the project. Hash codes could represent the compiler that builds it, along with the library(s) it links with.
I showed it to a couple software entrepreneuers (Wild Tangent and Chromium), but they had no interest in it.
I never did anything else with it, and so it goes.
I had actually done a writeup on it, and thought I had lost it. I found it, dated 2/15/2002:
---
Consider that any D app is completely specified by a list of .module files
and the tools necessary to compile them. Assign a unique GUID to each unique .module file. Then, an app is specified by a list of .module GUIDs. Each app is also assigned a GUID.
On the client's machine is stored a pool of already downloaded .module files. When a new app is downloaded, what is actually downloaded is just a GUID. The client sees if that GUID is an already built app in the pool, then he's done. If not, the client requests the manifest for the GUID, a manifest being a list of .module GUIDs. Each GUID in the manifest is checked against the client pool, any that are not found are downloaded and added to the pool.
Once the client has all the .module files for the GUIDs that make up an app, they can all be compiled, linked, and the result cached in the pool.
Thus, if an app is updated, only the changed .module files ever need to get downloaded. This can be taken a step further and a changed .module file can be represented as a diff from a previous .module.
Since .module files are tokenized source, two source files that differ only in
comments and whitespace will have identical .module files.
There will be a master pool of .module files on WT's server. When an app is ready
to release, it is "checked in" to the master pool by assigning GUIDs to its
.module files. This master pool is what is consulted by the client when requesting
.module files by GUID.
The D "VM" compiler, linker, engine, etc., can also be identified by GUIDs.
This way, if an app is developed with a particular combination of tools, it
can specify the GUIDs for them in the manifest. Hence the
client will automatically download "VM" updates to get the exact tools needed
to duplicate the app exactly.
Your description (including the detailed description in the reply) seems to be missing the crucial difference that git uses - the hash code of the object is not some GUID, it is literally the hash of the content of the object. This makes a big difference as you don't need some central registry that maps the GUID to the object.
Except that instead of a GUID, it's just a hash of the binary data itself, which ends up being more useful because it is a natural key and doesn't require storing a separate mapping
> I started using Git for something you might not imagine it was intended for, only a few months after it’s first commit
I started using git around 2007 or so because that company I worked for at the time used ClearCase, without a doubt the most painful version manager I have ever used (especially running it from a Linux workstation). So I wrote a few scripts that would let me mirror a directory into a git repo, do all my committing in git, then replay those commits back to ClearCase.
I can't recall how Git came to me attention in the first place, but by late 2008 I was contributing patches to Git itself. Junio was a kind but exacting maintainer, and I learned a lot about contributing to open source from his stewardship. I even attended one of the early GitTogethers.
As far as I can recall, I've never really struggled with git. I think that's because I like to dissect how things work, and under the covers git is quite simple. So I never had too much trouble with its terribly baroque CLI.
At my next job, I was at a startup that was building upon a fork of Chromium. At the time, Chromium was using subversion. But at this startup, we were using git, and I was responsible for keeping our git mirror up-to-date. I also had the terrible tedious job of rebasing our fork with Chromium's upstream changes. But boy did I get good at resolving merge conflicts.
Git may be the CLI I've used most consistently for nearly two decades. I'm disappointed that GitHub became the main code-review tool for Git, but I'll never be disappointed that Git beat out Mercurial, which I always found overly ridged and was never able to adapt it to my workflow.
> I started using git around 2007 or so because that company I worked for at the time used ClearCase, without a doubt the most painful version manager I have ever used
Ah, ClearCase! The biggest pain was in your wallet! I saw the prices my company paid per-seat for that privilege -- yikes!
As someone who wrote my first line of code in approx 2010 and used git & GH for the first time in… 2013? it kind of amazes me to remember that Git is only 20 years old. GitHub for instance doesn’t seem surprising to me that is <20 years old, but `git` not existing before 2005 somehow always feels shocking to me. Obviously there were other alternatives (to some extent) for version control, but git just has the feeling of a tool that is timeless and so ingrained in the culture that it is hard to imagine (for me) the idea of people being software developers in the post-mainframe age without it. It feels like something that would have been born in the same era as Vim, SSH, etc (ie early 90s). This is obviously just because from the perspective of my programming consciousness beginning, it was so mature and entrenched already, but still.
I’ve never used other source control options besides git, and I sometimes wonder if I ever will!
What surprises me more is how young Subversion is in comparison to git, it's barely older.
I guess I started software dev at a magic moment pre-git but after SVN was basically everywhere, but it felt even more like it had been around forever vs the upstart git.
Good pull. I was wondering if that was a true statement or not. I am curious if Linus knew about that or made it up independently, or if both came from somewhere else. I really don't know.
> He meant to build an efficient tarball history database toolset, not really a version control system. He assumed that someone else would write that layer.
Famous last words: "We'll do it the right way later!"
On the flip side: when you do intend to make a larger project like that, consciously focusing on the internal utility piece first is often a good move. For example, Pip doesn't offer a real API; anyone who wants their project to install "extra" dependencies dynamically is expected to (https://stackoverflow.com/questions/12332975) run it as a subprocess with its own command line. I suspect that maintaining Pip nowadays would be much easier if it had been designed from that perspective first, which is why I'm taking that approach with Paper.
FWIW, I just found out you can sign commits using ssh keys. Due to how pinentry + gnupg + git has issues on OpenBSD with commit signing, I just moved to signing via ssh. I had a workaround, but it was a real hack, now no issues!
20 years, wow seems like yesterday I moved my work items from cvs to git. I miss one item in cvs ($Id$), but I learned to do without it.
Oh yeah, SSH signing is incredible. I've also migrated to it and didn't look back.
A couple of differences:
- it's possible to specify signing keys in a file inside the repository, and configure git to verify on merge (https://github.com/wiktor-k/ssh-signing/). I'm using that for my dot config repo to make sure I'm pulling only stuff I committed on my machines.
- SSH has TPM key support via PKCS11 or external agents, this makes it possible to easily roll out hardware backed keys
- SSH signatures have context separation, that is it's not possible to take your SSH commit signature and repurpose it (unlike OpenPGP)
AFAIR keyword substitution of $Id$ included the revision number. That would be the commit hash in Git. For obvious reasons you cannot insert a hash value in content from which that hash value is being computed.
Thanks for the useful article!
In addition to a lot of interesting info, it lead me to this repo containing an intro to git internals[1]. Would highly recommend everyone to take a look
[1] https://github.com/pluralsight/git-internals-pdf
Ah yes. It was pretty cool that when Peepcode was acquired, Pluralsight asked me what I wanted to do with my royalties there and was fine with me waiving them and just open-sourcing the content.
It also is a testament to the backwards compatibility of Git that even after 17 years, most of the contents of that book are still relevant.
You'll be the first to know when I write it. However, if anything, GitHub sort of killed the mailing list as a generally viable collaboration format outside of very specific use cases, so I'm not sure if I'm the right person to do it justice. However, it is a very cool and unique format that has several benefits that GitHub PR based workflows really lose out on.
> patches and tarballs workflow is sort of the first distributed version control system - everyone has a local copy, the changes can be made locally, access to "merge" is whomever can push a new tarball to the server.
Nitpick, but that's not a distributed workflow. It's distributed because anyone can run patch locally and serve the results themselves. There were well known alternative git branches back then, like the "mm" tree run by Andrew Morton.
The distributed nature of git is one of the most confusing for people who have been raised on centralised systems. The fact that your "master" is different to my "master" is something people have difficulty with. I don't think it's a huge mental leap, but too many people just start using Gitlab etc. without anyone telling them.
since it seems it has been forgotten, remember the reason Git was created is that Larry McVoy, who ran BitMover, which had been donating proprietary software licenses for BitKeeper to core kernel devs, got increasingly shirty at people working on tools to make BK interoperate with Free tools, culminating in Tridge showing in an LCA talk that you could telnet to the BK server and it would just spew out the whole history as SCCS files.
Larry shortly told everyone he wasn't going to keep giving BK away for free, so Linus went off for a weekend and wrote a crappy "content manager" called git, on top of which perhaps he thought someone might write a proper VC system.
I think that was the first time I ever saw Tridge deliver a conference presentation and it was to a packed lecture theatre at the ANU.
He described how he 'hacked' BitKeeper by connecting to the server via telnet and using the sophisticated hacker tools at his disposal to convince Bitkeeper to divulge its secrets, he typed:
Are you worried about hash collisions from different objects? The probability of a collision of N distinct objects with SHA-1 is (N choose 2) * 1 / 2^161. For a trillion objects the probability is about 1.7 x 10^-25. I think we can safely write code without collisions until the sun goes super novae.
I don't know if this is a recent addition, but I started recently to use it: `git worktree` is awesome. It let's you have more than one copy of the repository at different points without doing the stash dance.
GitHub executed better than Bitbucket. And the Ruby community adopted GitHub early. Comparisons around 2010 said GitHub's UX and network effects were top reasons to choose Git. Mecurial's UX and Windows support were top reasons to choose Mercurial.
For me it won (10+ years ago) because for some reason git (a deeply linux oriented software) had better Windows support than Mercurial (that boasted about Windows support). You could even add files with names in various writing systems to git. I am not sure that Mercurial can do that even now.
Speaking of git front ends, I want to give a shout-out to Jujutsu. I suspect most people here have probably at least heard of it by now, but it has fully replaced my git cli usage for over a year in my day to day work. It feels like the interface that jj provides has made the underlying git data structures feel incredibly clear to me, and easy to manipulate.
Once you transition your mental model from working branch with a staging area to working revision that is continuously tracking changes, it's very hard to want to go back.
I think you need to check your history. In the early days, before closed/proprietary software, source code was often shared between programmers.
Lets look at text editors. They did not begin "closed source" - but companies have a team of people to help SELL their products, even if they are inferior to whats already out there.
Baically, once computers matured was an opportunity to make a buck. Companies started selling their products for a fee. I would not be surprised if the source code was included before someone realised people can just pay for the executable. More money can be made by excluding the source code so new updates can also be for a fee.
(Lets not talk about "End user Licence Agreements" in this post, OK)
The "dominance" of closed source is really about companies with money controlling the status quo, with lawyers and sales teams knowing how to push it in favour.
Companies like Micro$oft today have soo much money they dictate the direction our computers systems are going. They push it in a direction that favours them. They have a hand controlling the flow, like other big companies having a hand trying to change to stream for their intent and purposes.
This is why -- whether you love him or hate him, I have much respects for people like Richard Stallman or others like Linus Torvalds.. to name a few!
You want to talk about "innovation" ?? What do you think these "Closed source innovations" are build with? Software is created using a Programming language such as Python, C++, C, Javascript, etc... the VAST MAJORITY being free to use, under some sort of Open Source community!
Lets also look at large companies in general.. many of which are not innovating... they are just purchasing smaller companies that are trying to do new things... closed source software or not.
Lastly, lets also be honest that innovation is not created out of thin air -- everything is inspired by something previously.. whether a failed experiment or something already successful. New ideas come about with more failures.. but comes further ideas until, eventually, we have something successful!
Linux may be inspired by other Operating Systems, and those were inspired by other things, etc. Innovation is progressive. Point I am making, if any company found ANY opportunity to build a closed-source version to shut down a popular Open Source equiverlant... THEY WOULD DO IT!
We should say thank you to greedy BitKeeper VCS owners, who wanted Linus Torvalds to pay them money for keeping Linux source in their system. They managed to piss of Linus sufficiently, so he sat down and created Git.
I think the git usage patterns we've developed and grown accustomed to are proving inadequate for AI-first development. Maybe under the hood it will still be git, but the DX needs a huge revamp.
jordigh|10 months ago
There’s this whole creation myth of how Git came to be that kind of paints Linus as some prophet reading from golden tablets written by the CS gods themselves.
Granted, this particular narrative in the blog post does humanise a bit more, remembering the stumbling steps, how Linus never intended for git itself to be the UI, how there wasn’t even a git commit command in the beginning, but it still paints the whole thing in somewhat romantic tones, as if the blob-tree-commit-ref data structure were the perfect representation of data.
One particular aspect that often gets left out of this creation myth, especially by the author of Github is that Mercurial had a prominent role. It was created by Olivia Mackall, another kernel hacker, at the same time as git, for the same purpose as git. Olivia offered Mercurial to Linus, but Linus didn’t look upon favour with it, and stuck to his guns. Unlike git, Mercurial had a UI at the very start. Its UI was very similar to Subversion, which at the time was the dominant VCS, so Mercurial always aimed for familiarity without sacrificing user flexibility. In the beginning, both VCSes had mind share, and even today, the mindshare of Mercurial lives on in hg itself as well as in worthy git successors such as jujutsu.
And the git data structure isn’t the only thing that could have ever possibly worked. It falls apart for large files. There are workaround and things you can patch on top, but there are also completely different data structures that would be appropriate for larger bits of data.
Git isn’t just plain wonderful, and in my view, it’s not inevitable either. I still look forward to a world beyond git, whether jujutsu or whatever else may come.
schacon|10 months ago
Here's one of the first threads where Matt (Olivia) introduces the project and benchmarks, but it seems like the list finds it unremarkable enough comparatively to not dig into it much:
https://lore.kernel.org/git/Pine.LNX.4.58.0504251859550.1890...
I agree that the UI is generally better and some decisions where arguably better (changeset evolution, which came much later, is pretty amazing) but I have a hard time agreeing that hg influenced Git in some fundamental way.
uecker|10 months ago
I think the reason git then was successful was because it is a small, practical, and very efficient no-nonsense tool written in C. This made it much more appealing to many than the alternatives written in C++ or Python.
[1]: https://marc.info/?l=git&m=111366245411304&w=2
phire|10 months ago
Nobody was really happy with the VCS situation in 2005. Most people were still using CVS, or something commercial. SVN did exist, it had only just reached version 1.0 in 2004, but your platforms like SourceForge still only offered CVS hosting. SVN was considered to be a more refined CVS, but it wasn't that much better and still shared all the same fundamental flaws from its centralised nature.
On the other hand, "distributed" was a hot new buzzword in 2005. The recent success of Bittorrent (especially its hot new DHT feature) and other file sharing platforms had pushed the concept mainstream.
Even if it wasn't for the Bitkeeper incident, I do think we would have seen something pop up by 2008 at the latest. It might not have caught on as fast as git did, but you must remember the thing that shot git to popularity was GitHub, not the linux kernel.
cyberax|10 months ago
Linus absolutely had a couple of brilliant insights:
1. Content-addressable storage for the source tree.
2. Files do not matter: https://gist.github.com/borekb/3a548596ffd27ad6d948854751756...
At that time, I was using SVN and experimenting with Hg and Bazaar. Both were too "magical" for me, with unclear rules for merging, branching, rebasing.
Then came git. I read its description "source code trees, identified by their hashes, with file content movement deduced from diffs", and it immediately clicked. It's such an easy mental model, and you can immediately understand what operations mean.
tikhonj|10 months ago
gerdesj|10 months ago
git is just a tool to do stuff. It's name (chosen by that Finnish bloke) is remarkably apt - its for gits!
It's not Mecurial, nor github, nor is it anything else. Its git.
It wasn't invented for you or you or even you. It was a hack to do a job: sort out control of the Linux kernel source when Bit Keeper went off the rails as far as the Linux kernel devs were concerned.
It seems to have worked out rather well.
HexDecOctBin|10 months ago
Can you talk a little bit about this? My assumption was that the only way to deal with large files properly was to go back to centralised VCS, I'd be interested to hear what different data structures could obviate the issue.
emmelaich|10 months ago
When I first used Git I thought YES! This is it. This is the one. The model was so compelling, the speed phenomenal.
I never again used anything else unless forced -- typically Subversion, mostly for inertia reasons.
js2|10 months ago
What?
> Git isn’t just plain wonderful, and in my view, it’s not inevitable either.
I mean, the proof is in the pudding. So why did we end up with Git? Was it just dumb luck? Maybe. But I was there at the start for both Git and Mercurial (as I comment elsewhere in this post). I used them both equally at first, and as a Python aficionado should've gravitated to Mercurial.
But I like to understand how tools work, and I personally found Mercurial harder to understand, slower to use, and much less flexible. It was great for certain workflows, but if those workflows didn't match what you wanted to do, it was rigid (I can't really expound on this; it's been more than a decade). Surprisingly (as I was coding almost entirely in Python at the time), I also found it harder to contribute to than Git.
Now, I'm just one random guy, but here we are, with the not plain wonderful stupid (but extremely fast) directory content manager.
8ig8|10 months ago
https://fossil-scm.org/home/doc/trunk/www/history.md
ezekiel68|10 months ago
I'm good with this. In my over 25 years of professional experience, having used cvs, svn, perforce, and git, it's almost always a mistake keeping non-source files in the VCS. Digital assets and giant data files are nearly always better off being served from artifact repositories or CDN systems (including in-house flavors of these). I've worked at EA Sports and Rockstar Games and the number of times dev teams went backwards in versions with digital assets can be counted on the fingers of a single hand.
johnisgood|10 months ago
eternityforest|10 months ago
Remembering the real history matters, because preserving history is valuable by itself, but I'm also really glad that VCS is for most people completely solved, there's nothing besides Git you have to pay attention to, you learn it once and use it your whole career.
brandonmenc|10 months ago
pabs3|10 months ago
jdougan|10 months ago
https://news.ycombinator.com/item?id=40870840
zahlman|10 months ago
tasuki|10 months ago
Is it not? What are the alternatives?
BrotherBisquick|10 months ago
[deleted]
forrestthewoods|10 months ago
[deleted]
WalterBright|10 months ago
I showed it to a couple software entrepreneuers (Wild Tangent and Chromium), but they had no interest in it.
I never did anything else with it, and so it goes.
WalterBright|10 months ago
---
Consider that any D app is completely specified by a list of .module files and the tools necessary to compile them. Assign a unique GUID to each unique .module file. Then, an app is specified by a list of .module GUIDs. Each app is also assigned a GUID.
On the client's machine is stored a pool of already downloaded .module files. When a new app is downloaded, what is actually downloaded is just a GUID. The client sees if that GUID is an already built app in the pool, then he's done. If not, the client requests the manifest for the GUID, a manifest being a list of .module GUIDs. Each GUID in the manifest is checked against the client pool, any that are not found are downloaded and added to the pool.
Once the client has all the .module files for the GUIDs that make up an app, they can all be compiled, linked, and the result cached in the pool.
Thus, if an app is updated, only the changed .module files ever need to get downloaded. This can be taken a step further and a changed .module file can be represented as a diff from a previous .module.
Since .module files are tokenized source, two source files that differ only in comments and whitespace will have identical .module files.
There will be a master pool of .module files on WT's server. When an app is ready to release, it is "checked in" to the master pool by assigning GUIDs to its .module files. This master pool is what is consulted by the client when requesting .module files by GUID.
The D "VM" compiler, linker, engine, etc., can also be identified by GUIDs. This way, if an app is developed with a particular combination of tools, it can specify the GUIDs for them in the manifest. Hence the client will automatically download "VM" updates to get the exact tools needed to duplicate the app exactly.
frutiger|10 months ago
pmarreck|10 months ago
https://en.wikipedia.org/wiki/Merkle_tree
Except that instead of a GUID, it's just a hash of the binary data itself, which ends up being more useful because it is a natural key and doesn't require storing a separate mapping
swsieber|10 months ago
https://softwaremill.com/trying-out-unison-part-1-code-as-ha...
DiggyJohnson|10 months ago
exe34|10 months ago
emmelaich|10 months ago
js2|10 months ago
I started using git around 2007 or so because that company I worked for at the time used ClearCase, without a doubt the most painful version manager I have ever used (especially running it from a Linux workstation). So I wrote a few scripts that would let me mirror a directory into a git repo, do all my committing in git, then replay those commits back to ClearCase.
I can't recall how Git came to me attention in the first place, but by late 2008 I was contributing patches to Git itself. Junio was a kind but exacting maintainer, and I learned a lot about contributing to open source from his stewardship. I even attended one of the early GitTogethers.
As far as I can recall, I've never really struggled with git. I think that's because I like to dissect how things work, and under the covers git is quite simple. So I never had too much trouble with its terribly baroque CLI.
At my next job, I was at a startup that was building upon a fork of Chromium. At the time, Chromium was using subversion. But at this startup, we were using git, and I was responsible for keeping our git mirror up-to-date. I also had the terrible tedious job of rebasing our fork with Chromium's upstream changes. But boy did I get good at resolving merge conflicts.
Git may be the CLI I've used most consistently for nearly two decades. I'm disappointed that GitHub became the main code-review tool for Git, but I'll never be disappointed that Git beat out Mercurial, which I always found overly ridged and was never able to adapt it to my workflow.
nocman|10 months ago
Ah, ClearCase! The biggest pain was in your wallet! I saw the prices my company paid per-seat for that privilege -- yikes!
szvsw|10 months ago
I’ve never used other source control options besides git, and I sometimes wonder if I ever will!
eterm|10 months ago
I guess I started software dev at a magic moment pre-git but after SVN was basically everywhere, but it felt even more like it had been around forever vs the upstart git.
jakub_g|10 months ago
Small remark:
> As far as I can tell, this is the first time the phrase “rebase” was used in version control
ClearCase (which I had a displeasure to use) has been using the term "rebase" as well. Googling "clearcase rebase before:2005" finds [0] from 1999.
(by the way, a ClearCase rebase was literally taking up to half an hour on the codebase I was working on - in 2012; instant git rebases blew my mind).
[0] https://public.dhe.ibm.com/software/rational/docs/documentat...
schacon|10 months ago
esafak|10 months ago
Famous last words: "We'll do it the right way later!"
zahlman|10 months ago
jmclnx|10 months ago
FWIW, I just found out you can sign commits using ssh keys. Due to how pinentry + gnupg + git has issues on OpenBSD with commit signing, I just moved to signing via ssh. I had a workaround, but it was a real hack, now no issues!
20 years, wow seems like yesterday I moved my work items from cvs to git. I miss one item in cvs ($Id$), but I learned to do without it.
wiktor-k|10 months ago
A couple of differences:
- it's possible to specify signing keys in a file inside the repository, and configure git to verify on merge (https://github.com/wiktor-k/ssh-signing/). I'm using that for my dot config repo to make sure I'm pulling only stuff I committed on my machines.
- SSH has TPM key support via PKCS11 or external agents, this makes it possible to easily roll out hardware backed keys
- SSH signatures have context separation, that is it's not possible to take your SSH commit signature and repurpose it (unlike OpenPGP)
- due to SSH keys being small the policy file is also small and readable, compare https://github.com/openssh/openssh-portable/blob/master/.git... with equivalent OpenPGP https://gitlab.com/sequoia-pgp/sequoia/-/blob/main/openpgp-p...
fragmede|10 months ago
rwoerz|10 months ago
schacon|10 months ago
alkh|10 months ago
schacon|10 months ago
It also is a testament to the backwards compatibility of Git that even after 17 years, most of the contents of that book are still relevant.
palata|10 months ago
This is actually the part I would be interested in, coming from a GitHub cofounder.
schacon|10 months ago
zwieback|10 months ago
esafak|10 months ago
globular-toast|10 months ago
Nitpick, but that's not a distributed workflow. It's distributed because anyone can run patch locally and serve the results themselves. There were well known alternative git branches back then, like the "mm" tree run by Andrew Morton.
The distributed nature of git is one of the most confusing for people who have been raised on centralised systems. The fact that your "master" is different to my "master" is something people have difficulty with. I don't think it's a huge mental leap, but too many people just start using Gitlab etc. without anyone telling them.
bananapub|10 months ago
Larry shortly told everyone he wasn't going to keep giving BK away for free, so Linus went off for a weekend and wrote a crappy "content manager" called git, on top of which perhaps he thought someone might write a proper VC system.
and here we are.
a side note was someone hacking the BitKeeper-CVS "mirror" (linear-ish approximation of the BK DAG) with probably the cleverest backdoor I'll ever see: https://blog.citp.princeton.edu/2013/10/09/the-linux-backdoo...
see if you can spot the small edit that made this a backdoor:
if ((options == (__WCLONE|__WALL)) && (current->uid = 0)) retval = -EINVAL;
mjcarden|10 months ago
help
The room erupted with applause and laughter.
jwrallie|10 months ago
mrlonglong|10 months ago
waynecochran|10 months ago
neves|10 months ago
ralgozino|10 months ago
talles|10 months ago
Because Github was better than Bitbucket? Or maybe because of the influence of kernel devs?
pseudalopex|10 months ago
TiredOfLife|10 months ago
Tomis02|10 months ago
Github was more popular than Bitbucke, so git unfortunately won.
zoobab|10 months ago
pjmlp|10 months ago
https://xkcd.com/1597/
And it isn't as if I haven't used RCS, SCCS, CVS, Clearcase, TFS, Subversion, Mercurial before having to deal with Git.
Zambyte|10 months ago
Once you transition your mental model from working branch with a staging area to working revision that is continuously tracking changes, it's very hard to want to go back.
steveklabnik|10 months ago
filmgirlcw|10 months ago
pmarreck|10 months ago
esafak|10 months ago
unknown|10 months ago
[deleted]
unknown|10 months ago
[deleted]
forgetmunch|10 months ago
[deleted]
forgetmunch|10 months ago
[deleted]
yapyap|10 months ago
> 2005
wow.
7e|10 months ago
masfoobar|10 months ago
I think you need to check your history. In the early days, before closed/proprietary software, source code was often shared between programmers.
Lets look at text editors. They did not begin "closed source" - but companies have a team of people to help SELL their products, even if they are inferior to whats already out there.
Baically, once computers matured was an opportunity to make a buck. Companies started selling their products for a fee. I would not be surprised if the source code was included before someone realised people can just pay for the executable. More money can be made by excluding the source code so new updates can also be for a fee.
(Lets not talk about "End user Licence Agreements" in this post, OK)
The "dominance" of closed source is really about companies with money controlling the status quo, with lawyers and sales teams knowing how to push it in favour.
Companies like Micro$oft today have soo much money they dictate the direction our computers systems are going. They push it in a direction that favours them. They have a hand controlling the flow, like other big companies having a hand trying to change to stream for their intent and purposes.
This is why -- whether you love him or hate him, I have much respects for people like Richard Stallman or others like Linus Torvalds.. to name a few!
You want to talk about "innovation" ?? What do you think these "Closed source innovations" are build with? Software is created using a Programming language such as Python, C++, C, Javascript, etc... the VAST MAJORITY being free to use, under some sort of Open Source community!
Lets also look at large companies in general.. many of which are not innovating... they are just purchasing smaller companies that are trying to do new things... closed source software or not.
Lastly, lets also be honest that innovation is not created out of thin air -- everything is inspired by something previously.. whether a failed experiment or something already successful. New ideas come about with more failures.. but comes further ideas until, eventually, we have something successful!
Linux may be inspired by other Operating Systems, and those were inspired by other things, etc. Innovation is progressive. Point I am making, if any company found ANY opportunity to build a closed-source version to shut down a popular Open Source equiverlant... THEY WOULD DO IT!
piokoch|10 months ago
qntmfred|10 months ago
overboard2|10 months ago