top | item 32880188

(no title)

djhaskin987 | 3 years ago

It is precisely because git stores a graph of snapshots that is so hard to scale it to be able to store large monorepos with thousands of files. Every single commit stores a reference to the content in every single file. Using The duality mentioned in the article of storing change sets instead is an interesting trade-off. Instead of having to compute the diffs, you have to compute the snapshots. This is a good trade-off if you want a small portion of the snapshot checked out to your machine. This is why perforce does better with mono repos.

Storing changesets can handle very very large sets of files much more easily, but you pay the price with having to compute what file is stored whem which lengthens check out time even in the small. It is not a good trade off if you have to check out the entire thing anyways. This is why get is more popular with the open source community which is more like a bazaar than a cathedral

discuss

order

slondr|3 years ago

Mercurial has the obvious correct answer to this problem: Store diffs with snapshot checkpoints. That is, store diffs, but once you reach a critical number of diffs, store a snapshot instead of a diff so you can efficiently compute any specific commit state without storing snapshots for every commit.

Dylan16807|3 years ago

> Mercurial has the obvious correct answer to this problem: Store diffs with snapshot checkpoints.

That's how the git backend works for most of its storage.

It's a fool's errand to look at the conceptual model and start making claims about performance.

actionfromafar|3 years ago

Ah, so CVS is like GIF, git is like MPEG without keyframes, and Mercurial is like MPEG with keyframes.

kevincox|3 years ago

I don't think this makes large repos fundamentally hard. You just need good support for working on incomplete graphs. For example you just need to know the tree IDs/hashes of the non-checked-out trees in the directories that you have checked out. Then you graft your checked out directories onto the tree of the parent.