Unorthodocs: Abandon Your DVCS and Return to Sanity

[+] aggieben|11 years ago|reply

I take some of his points, but this seems kinda...a couple years late here. It seems to me that Git/Mercurial just became the best centralized systems; being distributed fixed some critical faults with being centralized, and didn't really introduce that much more complexity, IMO. In most use-cases, you're talking about one extra porcelain command in comparison to a server-connected client like SVN or TFS, assuming you push every single changeset.

I also think being distributed encourages small chunks of work, where being centralized encourages the batcave approach, and that, I think, is a much bigger differentiator than most people realize. It impacts everything. Suddenly, merging doesn't seem as bad, and then suddenly being able to easily and cheaply create branches is more important. Centralized systems never really got there because the client/server nature works against them. I'm glad to see some big players giving them attention again, because one of the author's best points is that being centralized has certain advantages. It's just that the existing centralized systems suck.

I also think the GitHub workflow comparison to the email list + patch days is a bit contrived because the email list procedure was never that simple (well, for getting ignored maybe it was, but the point was to contribute, not be ignored). I, too, remember those days.

Steps 1&2 under GitHub aren't really harder than step 1 under CVS (or what have you). He just neglected to mention all the steps involving fiddling with CVS connection parameters, or finding the right URL for a SVN repository (trunk? All the branches? Do I care about tags? Good luck if they aren't using the standard layout).

Github #3 also belongs in the first list. Goodness...have you ever read the OpenBSD mailing list? It's about as friendly as EFnet on PMS.

I could go on, but the point is that the number of steps for CVS+mailing list isn't really different than with GitHub.

[+] rlpb|11 years ago|reply

The reason git works so well is because it models real life. When you check out a tree to work on it, the moment you change a file in your editor you have forked the tree. Whether you save the changes or not, it's a fork, even if a temporary one. More than that, it's a distributed fork. What your undo buffer in your editor does, or your local filesystem does, or your VCS does, is merely an implementation detail.

It works well to be using a tool where the changes you make in your editor are unified with the more formal changes you'll be pushing back upstream. It makes the whole process smoother, and means that all editing involves an identical workflow.

Even with Subversion, for example: if you make a local change in a file concurrent to something being changed in the repository, then what you have is a diverging branch. Subversion will try to auto-merge when your commit fails and you update your working tree (IIRC). Just because Subversion neither calls your working tree a branch nor your update a merge doesn't stop them being so.

My point is that if you're changing code that is also being worked on elsewhere, you are using a DVCS whether you like it or not. Even if you aren't even using a specific DVCS tool. You have a DVCS the moment two people work on the same code base concurrently. So you might as well use a tool that integrates with the workflow that you already have.

[+] DominikD|11 years ago|reply

This is not real life, this is a workflow, one of many. It happens to mirror mine though and, perhaps surprisingly, I can easily achieve the same workflow with centralized VCS by using branches. It would be a major pain in CVS but works just fine for me (and others) in Perforce. So it can be done.

What's interesting is what "it" means" and that you're proving one of the points author is making: your workflow doesn't rely on the fact that your VCS is decentralized. It relies on the fact that branching/forking is easy. Painful branching is quite often a matter of legacy since CVS and SVN were architected in different times when our needs were mostly different and technology was very different.

[+] strictnein|11 years ago|reply

This may be strange to the author, but many of us experience Git completely outside of the realm of Github. Many of his complaints are fairly irrelevant then.

[+] wallyhs|11 years ago|reply

The complaints as I read them are:

* Large repository size because of blobs, large histories, or many files

* Difficulty of using git

* Pull requests aren't easier than patch bombs

Only the last one has to do with Github.

[+] tghw|11 years ago|reply

Maybe not GitHub, but likely you still use Git in a centralized model. Yes, it is technically distributed, and I know that people use it in a truly distributed manner, but if you're cutting builds or working with a sufficiently large team, it becomes very difficult not to consider at least one repo the "gold master".

[+] pm24601|11 years ago|reply

For me:

1. I do development offline all the freaking time. Basically I love the entire repo stored on my disk for the same reason I have a laptop instead of a desktop - so I can code anywhere.

2. News flash there are plenty of places that do not get good internet connectivity. Or my phone maybe dead or dying. Or I just stopped caring about having the tethering option on my plan. (and didn't care enough to bypass the restriction )

3. I don't use a lot of large images/video/etc in my development. Never have really, but then again I do a lot of server-side dev - very little Android coding.

4. Why do I care about a repo that is 1-2 gigabytes? I have a 4 terabyte disk.

Overall, comment: problem for facebook and google - for most companies this is not a problem. And if your company is having this problem, it has the money and people to solve the problem ... like facebook and google are doing.

[+] philtar|11 years ago|reply

The question is not when do you do development offline? The question is when have you ever needed the ENTIRE history of the repository offline?

Have you? Maybe. Do you constantly need to? Very very unlikely for the vast majority of people

[+] liveoneggs|11 years ago|reply

a 2GB git repo (without any binaries) probably has 4-5 million objects and will take a few GB of memory to clone and will otherwise be slow for status and other commands.

[+] mmagin|11 years ago|reply

I find it fantastically useful to have the entire revision history on local disk, it makes it practical to actually do ridiculous searches through all of the history. Try doing that against your svn server.

[+] gecko|11 years ago|reply

You're simply defining weaknesses of Subversion, not centralized versus distributed in either direction. E.g., we made blame for Git repos in Kiln ridiculously fast by caching memoized states for each file. You could also do that locally; we happened to do it server-side because it made more sense. There's no reason that an outright centralized system couldn't do that (and indeed, some do, though neither CVS nor Subversion).

[+] DominikD|11 years ago|reply

At the same time you're not grabbing a local copy of The Internetz to search them and your searches are sufficiently fast. This is not a problem of centralized (from the point of view of the user of course) VCS but quality of a particular implementation.

[+] kstenerud|11 years ago|reply

You don't need your own copy of the universe in order to search like that. You only need the capability built into the source-of-all-truth.

[+] to3m|11 years ago|reply

One point that isn't addressed is the difficulty of actually working with large numbers of unmergeable binary files, even assuming you've got the disk space to store them. Unless you've got some kind of centralized lock/unlock (check in/check out, etc.) functionality, serializing access to files is going to prove difficult.

I've seen it suggested that you should have some better means of coordination than what's in effect a per-file mutex. It's true you need to have a rough idea at a higher level what's going on (no point deciding ten people will all work on the same thing, when that means they'll all have to edit the same file!), but day to day, working at the file level, you still need the mutex to ensure people don't step on one another's work. It's a simple mechanism, and it scales about as well this sort of thing can.

[+] robaato|11 years ago|reply

There is value in communicating status of work via a central SCM repository. For binary files and locking this is particularly so.

The approach of Perforce where file status is tracked on the server provides this. There can be costs associated so it's not just a panacea for all SCM ills, but it's certainly worth considering.

One reason Perforce is so widely used in game development - it scales easily to Terabyte sized repositories straight forwardly.

[+] mbleigh|11 years ago|reply

Agree with most of the points in terms of most developers aren't using the D enough to make DVCS worth it. However, have to disagree about open source development before and after GitHub.

Ironically, it's the centralized user accounts on GitHub that made it really outstanding for open source. Now I don't have 100 different systems each with their own logins and conventions, I just have GitHub and all the myriad projects thereon.

Pull requests are better than patches because they are more explorable (quickly view diffs right from the browser), discussable (make comments on specific lines of code, mention other users), and programmable (webhooks to run tests against pull requests instead of manually pulling and running tests). Those are pretty big advantages.

[+] chaz72|11 years ago|reply

I have been wondering whether to go back to Subversion myself. The distributed option really doesn't apply to me, my minimal branching needs are met by Subversion, and oh my god, git's submodules get confusing.

[+] gecko|11 years ago|reply

(Author here)

To be clear, in real life, I do not actually like Subversion. I use Mercurial pretty exclusively for my own stuff, and would indeed use some of the centralized Mercurial extensions I linked in the article (e.g., remotefilelog, narrowhg, etc.) to scale upward if I had really big stuff flying around. The article is more about pointing out that going to a DVCS involves trade-offs, acknowledging that we have a lot of tooling designed to fake out those trade-offs, and discourage thinking of DVCS as a strict upgrade rather than an engineering decision with implications and costs and benefits.

[+] liveoneggs|11 years ago|reply

why are you using submodules if your needs are simple?

31 comments