top | item 7690897

GitHub monoculture

156 points| ingve | 12 years ago |nedbatchelder.com

104 comments

order
[+] sheetjs|12 years ago|reply
> Someone said to me, "I couldn't find coverage.py on Github." Right, because it's hosted on Bitbucket.

I was curious, so I went to bitbucket.org to try to find coverage.py. Experience was awkward to say the least:

1) Bitbucket homepage does not have a search bar, nor is there a link from the homepage to the search [1]

2) Guessing that the URL is /search puts you in the team profile for account `search`. finally you see what looks like a search bar at top [2]

3) Searching for coverage.py doesn't show `ned / coverage.py` in the first page! There are lesser-known forks of coverage.py that show up higher than the original project. [3]

I'm sure these will improve over time, but discoverability is still incredibly difficult on bitbucket.

[1] http://i.imgur.com/soRXB98.png [2] http://i.imgur.com/z5YmcL2.png [3] http://i.imgur.com/uhg8rbc.png

[+] hesselink|12 years ago|reply
Why not use Google? Searching for 'coverage.py', the bitbucked repository is the fifth link for me. The first and second link are more general informational pages which link to the bitbucket repository. Searching for 'coverage.py source repository' gives the bitbucket repository as the first and second result.
[+] cpeterso|12 years ago|reply
Bitbucket also has awkward URLs, where my fork of someone else repo appears as a subdirectory of their project. I still can't get over the fact that the Bitbucket website won't let (easily) visitors search without creating an account. That's a terrible user experience and I'm sure many potential users or customers just choose the tab and go to GitHub.
[+] Game_Ender|12 years ago|reply
If you are logged in their is a search bar in the upper right hand corner. It has the same problem as your search thing.
[+] sitkack|12 years ago|reply
Just do a

    pip home coverage
And the home page for the package will pop up just like `homebrew`
[+] OGC|12 years ago|reply

[deleted]

[+] akerl_|12 years ago|reply
I go to GitHub to find a project first because that's where I hope it is. GitHub's UI and culture go a long way towards standardizing repo layout in such a way that I can more easily use it, plus if I find bugs or issues there's the Issues and Fork links right there.

In my admittedly anecdotal experience, GitHub projects are way more likely to have a clear overview with a link to a GitHub Pages site with real docs, they're more likely to define the rules/procedures for sending in contributions, and they're more likely to have an active community of contributors. I know I'm not going to have to hunt for the source link in a paragraph somewhere, because if I know the repo name and author I can clone the source.

[+] zzzeek|12 years ago|reply
If issue tracking could be distributed (yes I know about Fossil and others), and the players like github and bitbucket were compatible with such, then we would no longer have any dependence on any of these services.

The way I'd host my projects would be:

1. I have my own server where I have a git repo, the issues repo, and maybe a wiki repo.

2. My server would host semi-statically generated project browsers; source code browsers, issue browsers, etc. Unlike when I was running Trac, I'd have no need to allow people access to my server, no need to worry about spammers, pages would be low-latency/cachable so that I can allow crawlers as well even on a very small linode host.

3. all of my wiki/issues/git would mirror out to github, bitbucket, and whoever else wants to participate, and updates on those sites mirror back. I already do this now with my git repos using WSGI services and hooks.

4. when someone wants to post/update an issue, they link out to one of those services, or even run a local command line/GUI based issue system (wouldn't that be great??), and everything synchronizes back.

The only missing link is a decent distributed issue system everyone can get behind. Building out the glue to sync between services is not that hard of a task.

[+] masklinn|12 years ago|reply
> If issue tracking could be distributed (yes I know about Fossil and others), and the players like github and bitbucket were compatible with such, then we would no longer have any dependence on any of these services.

This is definitely the biggest problem, there are a bunch of "distributed" trackers living inside the repository, but so far I've seen nothing which worked well (even ignoring the lack of integration with gh/bb/whatever).

There's the second issue of contribution workflow though, it's not really possible to have a dozen different places where contributions can be proposed, discussed, hashed out and possibly rejected; with the sub-problem of tools integration (e.g. integration bots or CI runs on contribution proposals)

[+] Gmo|12 years ago|reply
I totally agree with the author.

Worse is, when answering to the question what do you use for version control software, I hear quite often people saying "github" and not git, and not understanding the difference between the two.

[+] hk__2|12 years ago|reply
Yes, and some people say “just go on git” or “I found it on git”, meaning “GitHub”.
[+] rdtsc|12 years ago|reply
Whenever Github goes down I always wish someone would invent a decentralized version control system.

Remember at work when Github goes down everyone is just sitting there twiddling their thumbs for hours. As in "yeah my commit is ready just waiting to push it" or "waiting for the issue to be close, then we can test".

It is ironic a bit isn't it. One of Git's major selling points was -- we are tired of one centralized version system, you shouldn't need a single point of failure or be tied to the network in order to develop. And then years later we have one giant global single point of failure across the world. There is some irony there.

(Yeah, yeah, I know I can commit locally. But at some point you'll need to update issues, and publish your changes. This is more a of a generalized point. It would be the same if bitbucket went down and so on. We never it seems have gotten to the mythical peer 2 peer internet of things -- it just doesn't work it seems).

[+] dlisboa|12 years ago|reply
> But at some point you'll need to update issues, and publish your changes.

The second part is solved. You just need a second git server. Maybe Bitbucket, or maybe your own. We just never do it though, for reasons I don't know either. I don't do it, even if I know how. There are multiple tools to push to different servers automatically, but we still go the comfortable route. Maybe it's the psychological thing that "if Github goes down it won't be for long, is it really going to affect me?".

Because of that I don't agree the tools are the problem though. Or even that they are part of some irony. Git delivers on its promise. People just don't use it like that.

The issues part or the dependency with Github to test/deploy software is more of a vendor lock in situation. A lot of people depend on it being up to work, but that's the convenience charge since Github isn't open source. Not much different from people depending on Heroku to be up for their application to function.

[+] icebraining|12 years ago|reply
We don't use github, but when our server is offline, we just spin up a VM from backup, push the latest commit to it and start using it as the new centralized server. I think what you lack is a decent policy, not technology.
[+] caitp|12 years ago|reply
Git is distributed, but obviously you're going to have a canonical repository somewhere (git.kernel.org, for instance). Decentralization doesn't mean getting rid of a canonical place where approved patches end up, and you really aren't going to see canonical repositories disappear, we have them for a good reason.

However, there is something to be said about mirroring canonical repositories for accessibility.

[+] aeharding|12 years ago|reply
Do you even understand git? Literally push the whole repo wherever you want. git remote add blah. It doesn't matter where it's hosted. Many popular projects push their code to multiple services -- it's not harder than pushing to one.

If github goes down and your devs are twiddling their thumbs because they can't push, you have larger problems.

[+] masklinn|12 years ago|reply
> But at some point you'll need to update issues

You can reference (and thus comment on) and close issues from your commit messages.

> and publish your changes.

That does not require sitting on your ass and doing nothing though.

[+] baby|12 years ago|reply
One of the big selling point of github is how little downtime they have.
[+] fred_durst|12 years ago|reply
One of the nice things about Github and Bitbucket is that they both use Git. You can push to both of them and enjoy the best of both worlds.

    git remote add bitbucket [email protected]:username/reponame.git
    git remote add github [email protected]:username/reponame.git
You could even add another Git repository you control fully as well.
[+] ChuckMcM|12 years ago|reply
So where it the 'apache' of Github? Presumably that would be Gitlab, but its not quite aligned with the Github experience.

One could imagine a package you install on your droplet, ec2, linode, what-have-you, which would create a 'site' for your project which included a Wiki, a user management package with role management, automated source code backup to Glacier, automated mirror management to choice of mirrors. Install, set up your various keys (aws, ssh to mirrors, etc) and then push to it your source and make it the origin master.

You make that and then people might start creating free floating git repos, except that it will cost them some small money (like $25/month) which would discourage many people.

[+] _wmd|12 years ago|reply
Gitlab is a non-starter since it requires developers who can barely manage to use Git to also run their own servers. If that weren't the case, then we'd just have another GitHub-lookalike. I'm old enough to remember when self-hosted Subversion was a popular geek "badge of honor", and the frustration of trying to pull code from some guy's VPS that had been offline all weekend.

Something like Fossil's model seems better, where issue tracker and wiki are instead built into the repository. Sure, essentially you're still running a little mini web server on the local machine, but the presentation is completely different (a single binary rather than a huge Ruby webapp with many dependencies).

The problem isn't really that GitHub goes down, but that we can't mutate or make use of certain pieces of state while it is down (mostly issues/wiki), and nobody can see our changes until it comes back up.

Actually Git comes with all the tools necessary to send pure data in the form of changesets by mail, its just that its usability sucks unless you have a working sendmail/mutt config, which 99% of devs don't bother with any more.

That seems the more interesting thing to fix, rather than just having another self-hosted SourceForge clone or whatever

[+] akerl_|12 years ago|reply
Unless Gitlab or a similar product starts offering a way that I can search all the Gitlab projects, it's not going to amount to much. Github has taken over the space not only because it's easy and powerful for creators: it's also a powerful search engine for users. As the article notes, the first thing many of us think when looking for code is "Lets search GitHub".

Google is not efficient at searching for code. If each creator spins up their own project sites using Gitlab or other tool, there needs to be some way of searching the network of Gitlab sites if we want users to actually switch.

[+] Silhouette|12 years ago|reply
This article captures just a little of what I've been feeling for some time.

There are a couple of other things about GitHub specifically, and the presumption in some quarters that everyone uses it, that put me off.

Firstly, they create their own world where lots of things aren't quite like normal Git. Some are simple terminology changes, like using "fork" instead of "clone". Some are more intrusive, like tying the pull request mechanism to the GitHub review tool.

Secondly, GitHub terms only let you have one unpaid account. If I want to maintain some personal repos, but also some with one of my professional hats on, it costs me money. If I want to keep my work on a major project separate to a few sidelines, it costs me money.

Thirdly, GitHub is an on-line service, and like anything else in the cloud, that means it comes with unknown security, privacy and reliability implications. For a big open source project, that might or might not be a problem, but I don't know why anyone would expect me to host my own projects there instead of, I don't know, in a Git repository on a server at home or at work, or something crazy like that.

So no, sorry, I won't fork you on GitHub. Nor will I contribute a patch via a pull request. Nor will I raise my helpful bug report on your project if you require me to submit it via GitHub. And I guess I'm lucky that I don't apply for jobs as an employee any more, because I am able to show approximately none of the best professional quality code I write without violating confidentiality agreements, and certainly none of it is in a GitHub repo for public inspection.

I am happy to support open source projects when I have a bit of time free, and I'm happy to demonstrate my credentials and abilities to prospective clients, but GitHub is a hoop that is rarely worth jumping through IME.

[+] pekk|12 years ago|reply
Well, a fork isn't the same as a clone, so that is why there are different words.

There is no pull request mechanism not tied to a site, unless you want to send an email with a public git URL yourself. Which you can still do, assuming the project you are working with has specified that. But why should you be able to dictate to other people that they do it this way? Github gives you the public URL, if you want it to, and lots of people do.

Hosting your own source repository also has unknown security, privacy and reliability implications. (Or known ones that are often not very good)

It doesn't really matter to Github users if you don't want to use Github. Doesn't make a lot of sense that you would decline to make bug reports just because the project used Github. That sounds like you are boycotting those projects for some moral reason.

[+] __david__|12 years ago|reply
> Secondly, GitHub terms only let you have one unpaid account. If I want to maintain some personal repos, but also some with one of my professional hats on, it costs me money. If I want to keep my work on a major project separate to a few sidelines, it costs me money.

What they do allow, though, is adding new "organizations". These are effectively groups that get top level URLs just like users (https://github.com/galvanix/ is one my business partner and I set up).

When you create a repo you can put it in your main account or in any of the organizations you are part of.

If your organization doesn't host any private repos then there is no cost.

Now, this isn't completely disconnected from your main account—you still do everything in that name, but it's close.

> …because I am able to show approximately none of the best professional quality code I write without violating confidentiality agreements, and certainly none of it is in a GitHub repo for public inspection.

Well, ain't that the truth for most of us! :-)

[+] mattip|12 years ago|reply
Mercurial is so much easier for day to day usage, I work with both and need to explain so much more when getting a new person up to speed with git. The "detached head" state is very difficult to explain.
[+] noir_lord|12 years ago|reply
I absolutely agree but I still use git.

Since git has largely won the dvcs war and I'm a one man developer having to inter-operate with others it makes sense for me to suck it up and use git myself.

I really dislike git though.

[+] curun1r|12 years ago|reply
Mercurial may be easier to use than git, but Github is easier to use than Mercurial. Switching to a workflow where all commits are made to forks and all updates to master come through pull requests largely solves the problems with using bare git.

Which is kinda the problem we're discussing here. Git is far less usable without Github and yet all the workflow niceties that we've come to rely upon are locked up in Github's proprietary implementation. The more developers come to rely upon Github features, the more locked into Github we are and the more difficult it would be if Github were to go away or become obnoxious.

[+] masklinn|12 years ago|reply
That's completely irrelevant to TFA (github is not the only git host after all)
[+] jader201|12 years ago|reply
I wouldn't mind -- strictly from a user standpoint -- if all code was in one place. It's easier to collaborate and contribute to other repositories when they're all in the same SCM and host. If someone has their source in another host (whether it's Git or some other SCM), I'm less likely to bother contributing -- and I would assume this is the case with many others.

Of course, from an "all your eggs in one basket" or business monopolization standpoint, I agree with the author.

But selfishly speaking, I like Github and find that it adds more value to me the more it becomes a monoculture.

[+] threatofrain|12 years ago|reply
Github isn't the only choice, but I think it's the preferred choice for a code sharing community because they basically give their product for free for open-source projects on public repositories. And it's easy and good looking.

Bitbucket distinguishes itself from Github by making itself free for small teams only, but allowing you free private repositories. Seems like they are more interested in businesses and proprietary code.

[+] nebstrebor|12 years ago|reply
I like Github, but just switched our business repos over to bitbucket. Github's pricing for businesses isn't even competitive.
[+] kylemaxwell|12 years ago|reply
So BitBucket and Gitorious and others will need to excel and beat GitHub at their game. Competition will eventually solve this the same way we aren't beholden to the same methods and sites we used 10 years ago. The next winner may not even exist yet and I expect that the barriers to entry here are not nearly as high as those in other markets that have been disrupted over the last few years.
[+] leorocky|12 years ago|reply
GitHub has already won due to networking effects. It's like trying to unseat Facebook as the dominant social network. It will require a sea change in technology used for version control or sharing code.
[+] akerl_|12 years ago|reply
The competition you describe exists now. Having a new "winner" won't change the point being made in the article. I don't agree with the conclusion he came to, but he's pointing out issues with there being a winner whose site is the de facto place where code / project / people go.
[+] frankpinto|12 years ago|reply
I use Bitbucket too. Github doesn't provide free private repos last time I checked, needed for bootstrapped startup. The centralization vs. distribution+independent evolution debate is always an interesting one, though. I think you need high quality centralization at first then competition can be useful; Github set the bar, now lets see what everybody else can bring to the table
[+] malandrew|12 years ago|reply
I would love it if Github built in a contributors license agreement into my repos so all issues and wiki content were licensed to the repo owner. This would give repo owners the ability to move from one version control service to another without getting locked in by copyright issues on issues and comments made by others.
[+] ryanmarsh|12 years ago|reply
This is a non-issue and I don't mean that in a small way.

Github has produced a near monopoly on hosting for popular open source projects because it is so liked (as OP himself admits). Soon enough something better will come along that makes Github look like SourceForge and then we will all migrate our projects over there.

[+] carlio|12 years ago|reply
Finding out a project is on Bitbucket or uses Mercurial has actually stopped me wanting to contribute. The additional hassle of remembering how Mercurial works, digging out my BitBucket credentials etc, is a pain in the backside.
[+] deckiedan|12 years ago|reply
You can use bitbucket just fine to host git projects.

I use a mixture of bitbucket and github for different projects, not wanting all my eggs in one basket, really. I find them pretty much equal in many respects - except being able to have private repos on bitbucket is very convenient, and github pages is also very useful. I wish bitbucket had something similar.

https://bitbucket.org/dfairhead/streetsign-server in case you're interested.

[+] mamcx|12 years ago|reply
The opposite to me (not github, but git). I wish github support mercurial. I have used hg-git, but still is something else to do.
[+] sitkack|12 years ago|reply
Bitbucket supports git.
[+] weixiyen|12 years ago|reply
I don't see why this is a bad thing.

There is inherently less cognitive overhead to use something you are already familiar with. Same reason why people wouldn't give Bing the time of day.

All things being equal, and you just had to choose a public place for a remote repo, Github should be your first choice unless you have a very good reason go use something else.

I'd rather not, nor expect consumers of my code, to have to learn a completely different UI than what they are used to in order to use or contribute to my public repo.

[+] nickstinemates|12 years ago|reply
I recently started a storing a new project and its services on Bitbucket for a variety of reasons.

- I think it's healthy to always look at the different tools in the space to witness first hand if you're missing out on something

- There's some product integration in the works, so, it only makes sense to get curious about how existing users of the platform may feel once it is there.

So far it's going well, but it's early days.

[+] nmrm|12 years ago|reply
I actually much prefer bitbucket's issue tracker to github's.
[+] kyrra|12 years ago|reply
It's interesting to see the progression of public source control hosting. SourceForge was the goto place for many years, but it's really only used for hosting large binaries at this point, and very few people host code there. Google Code started to steal some people away from SF, but again it was more focused on project hosting then just getting code to people.

Github is interesting is that the code is shown front-and-center whenever you visit a project page. Sure there is the Readme.md and wiki stuff you can do, but it makes browsing code and project discovery very easy. Github also made project creation and management super easy and lightweight. The number of questions you need to answer to setup a repo is so minimal. It gets out of your way when setting up a project.

I still feel like Google Code has some things going for it. The biggest being it's defect tracker compared to Github.

[+] greggman|12 years ago|reply
I actually don't feel like having the code at the top is helpful for me. I rarely look at code on github. I go to a project page and scroll down past the code to read the readme. That happens easily 37 times out of 38. Yea, I made up those numbers. I know there's been a few times I've clicked on the code but it seems so seldom I'd much rather the readme was front and center and there was a "code" button/tab.

If the project sounds interesting I download it and look at the code locally. The few times I look at the code online is in response to like a stackoverflow question if I don't have the code locally already and I want to look up something.

But that's just me. It's not that big a deal for me to scroll down. If everyone else goes to code more than the readme under it then fine