We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.
The disadvantage here is that any given file in the repo could be stored in an ever increasing number of packfiles. Each existing version of the repo will generate a new packfile to get it to the newest version, and it's up to the authenticated masters to generate and seed each of those packfiles, while peers either do or do not cache these replicated datas. In short, this means of syndicating updates ignores the Merkle-DAG-ness (DAG-osity?) of Git.
I agree it is really stellar. The billion dollar concept for me though is encrypted repo torrents. Imagine a group of servers that are hosting encrypted chunks which form the basis of a homomorphic encryption protocol for distribution using forward error correction to allow recovery of the deltas if n of m components of that delta can be recovered. Basically if you have the key you can pull out of this amorphous cloud your source code, and if you don't have the key you won't even know it is out there.
I started building a toy version of this about 5 years ago but got distracted by work. Essentially the repo key encrypted the packfile, the storage reliability layer used its another key to encrypt the chunks. The latter key would find the chunks, with enough reliability to re-create the encrypted packfile, which the latter key could decrypt and apply to your repo.
A very fun problem in distributed systems and data structures.
> The un-updateability of torrents is something that seems to seriously limit it's use.
I am not involved with the development of torrents at all but (please bear with me until the end) my initial reaction is that we should think of the lack of ability of torrents to update as a feature and not as a bug.
Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol? (Please note that this is not the case of http://xkcd.com/927/?cmpid=pscau as I am not advocating a new protocol for every use case)
It seems like we can sign torrent files with gpg keys. Perhaps I am wrong. Perhaps, we can allow updating in torrents if we require that the updates be signed with the same private key as the original torrent? Am I barking up the right tree here?
Edit: Oops. I edited this post before I saw the reply about BEP-0039. Apologies.
Peers don't have to seed packfiles the way we're used to for "traditional" seeding of movies or music; these packfiles actually represent transition from one commit to another (instead of "all content from beginning to now"), so they are inherently ephemeral. They don't even need to be kept on the disk, because they will be generated on the fly every time a client is interested.
The DAG-osity of git really helps here (because you only have to transfer what's really needed), and the "immutability" of git helps because if your project is popular and you update your branch, everyone will want to go from the old commit to the new commit, so everyone can share the diff between them directly.
> It surprised me that nothing like this seems to exist already in the decentralization community.
There was a GSoC project in 2013 which did exactly this, using Freenet as decentralized storage backend with a Web of Trust for Spam resistant and updatable identities (note that in the gittorrent scheme once someone claimed a username, that username will stick there forever). It works and compared to GitTorrent it adds anonymity and upload-and-run.
Major fan of this idea. But how does one address the GUI challenges presented by leaving GitHub behind? It can't be understated that GitHub provides an amazing (communal/social) user experience.
You're right, of course. This is just a first step.
One interesting followup idea might be that the BitTorrent library I'm using, webtorrent, also works in browsers over WebRTC. But I'm not using that because I wouldn't know what to do with a git cloned repo inside of a browser tab. Maybe someone else will though. :)
It's the same old problem of trying to build a peer-to-peer social network. How do you ensure that large files are distributed correctly and quickly with minimal security implications in an environment where nodes are constantly joining and leaving the network? Perhaps it's possible, but if it were an easy way of doing it, there would be more of that sort of thing around.
I guess the projects that can't be on github won't mind the GUI challenges as long as they have some way to have a central repository without having to maintain a server on their own.
Do the same thing you did for the Gittorrent. A GUI client that runs a torrent of a php file that connects to a torrent of a database file. You just need to always be connecting to the latest and greatest.
The post mentions using the blockchain for unique username registration and mapping to public key hashes, and as it turns out there's a project I and others have been working on that does exactly this called Blockstore.
The way it works is there's a mapping between a unique name and a hash in the blockchain, and then there's a mapping in a DHT from the hash to the data to be associated (which can be a plain old public key and can also be a JSON file that references a public key and other identity information).
That's great, thanks! I should just use this (preferably with the DHT I'm already using to look up Git commits) instead of reimplementing myself.
What do you think about the idea of making pluggable modules to connect Blockstore with web frameworks (Django, Rails), without the framework/website authors having to get involved in understanding Bitcoin themselves?
The remaining hard part is the adapter layer to enable the extra applications such as the issue tracker to use the Git repository for storage. Joey Hess has a good article about Git "databranches" here: https://joeyh.name/blog/entry/databranches/
A very interesting idea, GitTorrent, but I have one question which comes to me whenever I read about a delta-based distribution scheme: who is going to generate and share all those deltas?
Some Linux distributions have experimented with delta-based package repositories, examples are deltup for Arch Linux and rpm-delta for RPM-based distros. Some of the known issues are:
- choosing the number and spacing between deltas. Fine-grained deltas require more storage space, coarse-grained deltas require more download bandwidth.
- retiring old deltas: periodically deleting all deltas older than a certain version, replacing them with the full package of that version. Again a trade-off between storage space and download bandwidth.
For Git repositories, this would roughly translate to:
- choosing the number, history spacing, and size of the Git packs per repository.
- retiring old Git packs: periodically deleting Git packs older than a particular revision, replacing them with a bare repository at that revision.
My impression was that peers are generating the deltas on-the-fly based on which commits the requesting peer states it needs. The problem's therefore shoved onto git itself, with the seeder just cherry-picking a specific range of commits from its own copy of the repo and bundling them together.
GitLab CEO here, thanks for mentioning us as the open source alternative. We think in the short term multiple organizations hosting their own GitLab is the way to go. It is hard to do issues and pull/merge requests in a decentralized way (the OP is impressive but it shows distributed git instead of distributed GitHub). I would like to see federated merge requests http://feedback.gitlab.com/forums/176466-general/suggestions...
Agreed that the argument is a bit weak, but we would still end up with a major centralized repository for a decentralized protocol. And the changes to make Gitlab more like the proposal would probably be more work than just making the proposal a new project.
I need to go more in-depth on the proposal. But the first thing that strikes me, is if you're going to use the Blockchain (with a capital "B") as storage of usernames and such. Why not use namecoin? It has the process for name consensus down. Also it won't pollute the main Bitcoin blockchain.
I have a mild bias against altcoins, and have heard bad things about Namecoin in particular: that the anti-spam incentives aren't good, leading to illegal files stored in the blockchain itself, and that there's no compact representation (like Bitcoin's Simplified Payment Verification) for determining whether a claimed name is valid without consulting a full history.
As I understand it, these two design flaws combine to mean that you have to store some very illegal files to use a namecoin resolver, which doesn't sound good to me. (I may be mistaken, since the bad things I heard about Namecoin came from Bitcoin people..)
Yeah! I was a Gitchain backer. The difference is that Gitchain stored the actual git commits in the blockchain, and I leave the actual commits on the hard disks of each BitTorrent seeder.
Stored github in btsync for a startup before. It actually worked ok. We stopped doing it because btsync was trivial to crash with basic fuzzing and was closed source
This is an aside but I like the author's writing style. Not only is he clear but also describes why he's doing what he's doing and how it's important to him and others. Really helps me give the ideas more thought!
> Google Code announced its shutdown a few months ago, and their rationale was explicitly along the lines of “everyone’s using GitHub anyway, so we don’t need to exist anymore”.
I mean, sorta. It was also because running a service is expensive, and containing abuse is a constant thankless treadmill.
> We’re quickly heading towards a single central service for all of the world’s source code.
Far from it? Not that a fully-decentralized system seems bad, but there are many things that aren't github. I don't even have anything of interest on github.
I have to admit that I started reading this post feeling a little snarky about the concept. However, I think Chris makes an excellent case for the concept.
This is awesome, but it centralizes on JavaScript.
It is an implementation of a standard without the standard being defined so other implementations can spring up.
Git, one could argue, is language centralized also, which is technically true. That I don't have an answer for. But I don't believe handing off so much dependence to a JavaScript application fits for me.
A C/C++ application like Git I can overlook, at least for a decade or so, but JavaScript feels like a perpetual beta/prototype only. Granted, that's my subjective feeling.
The word "centralize" in the way you're using it is very misleading. Every piece of programming in existence uses some sort of language.
What you're really stating, I think, is that this is written in Javascript and you don't like Javascript. That's totally fine, and it's your prerogative. I'm sure that, like any other piece of programming, GitTorrent can be rewritten in other languages. If Javascript bothers you that much, then do this: rewrite it in Ruby if that's what you prefer.
But please stop attacking the claim that this is "decentralised github" by claiming that this "centralises" on Javascript. It doesn't "centralise" on any language. It's just a first implementation written in Javascript.
I could see this integrating really nicely with mailing lists. The commit could be sent out to the mailing list, and anyone who is interested in reviewing the code or doing a merge would already have the information they need, no blockchain required.
If a version of this with friendly name support is released, I will mirror all my active GitHub repositories there.
If someone builds on this, as discussed elsewhere in the thread, to make a decentralized service that mimics 'social' functionality such as issues and pull requests, I will strongly consider using it instead of GitHub (depending on the UI, stability, etc.).
I don't even have any particularly popular repos, so there is no real reason for anyone to care about the above, but, y'know, HN comments approving of the idea don't necessarily translate into actual interest in the product, so now you know there's at least one person in the latter category. :)
[+] [-] rektide|10 years ago|reply
We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.
The disadvantage here is that any given file in the repo could be stored in an ever increasing number of packfiles. Each existing version of the repo will generate a new packfile to get it to the newest version, and it's up to the authenticated masters to generate and seed each of those packfiles, while peers either do or do not cache these replicated datas. In short, this means of syndicating updates ignores the Merkle-DAG-ness (DAG-osity?) of Git.
The un-updateability of torrents is something that seems to seriously limit it's use. There are a lot of interesting attempts to hack around this- LiveStreaming and Nightweb are two that spring to mind. https://www.tribler.org/StreamingExperiment/ https://sekao.net/nightweb/protocol.html
[+] [-] Rhapso|10 years ago|reply
You can use it for git repos essentially out of the box by uploading your repo.
It is made of content addressed chunks which will get re-used on each re-upload.
[+] [-] ChuckMcM|10 years ago|reply
I started building a toy version of this about 5 years ago but got distracted by work. Essentially the repo key encrypted the packfile, the storage reliability layer used its another key to encrypt the chunks. The latter key would find the chunks, with enough reliability to re-create the encrypted packfile, which the latter key could decrypt and apply to your repo.
A very fun problem in distributed systems and data structures.
[+] [-] minot|10 years ago|reply
I am not involved with the development of torrents at all but (please bear with me until the end) my initial reaction is that we should think of the lack of ability of torrents to update as a feature and not as a bug.
Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol? (Please note that this is not the case of http://xkcd.com/927/?cmpid=pscau as I am not advocating a new protocol for every use case)
It seems like we can sign torrent files with gpg keys. Perhaps I am wrong. Perhaps, we can allow updating in torrents if we require that the updates be signed with the same private key as the original torrent? Am I barking up the right tree here?
Edit: Oops. I edited this post before I saw the reply about BEP-0039. Apologies.
[+] [-] rakoo|10 years ago|reply
The DAG-osity of git really helps here (because you only have to transfer what's really needed), and the "immutability" of git helps because if your project is popular and you update your branch, everyone will want to go from the old commit to the new commit, so everyone can share the diff between them directly.
[+] [-] uptown|10 years ago|reply
"Thinking about 'meta' torrent file format." https://gist.github.com/mait/8001883
Truly Meta - Meta: https://news.ycombinator.com/item?id=6920244
[+] [-] youvebeenbad|10 years ago|reply
[+] [-] ArneBab|10 years ago|reply
There was a GSoC project in 2013 which did exactly this, using Freenet as decentralized storage backend with a Web of Trust for Spam resistant and updatable identities (note that in the gittorrent scheme once someone claimed a username, that username will stick there forever). It works and compared to GitTorrent it adds anonymity and upload-and-run.
A current article describing it is here: http://draketo.de/english/freenet/real-life-infocalypse (it got referenced here, too: https://news.ycombinator.com/item?id=9562749 )
The GSoC project was done by Steve Dougherty: http://www.google-melange.com/gsoc/project/details/google/gs...
> I’d be happy to work on a project like this and make GitTorrent sit on top of it, so please let me know if you’re interested in helping with that.
Have a look at Gitocalypse: https://github.com/SeekingFor/gitocalypse
[+] [-] arxpoetica|10 years ago|reply
[+] [-] cjbprime|10 years ago|reply
You're right, of course. This is just a first step.
One interesting followup idea might be that the BitTorrent library I'm using, webtorrent, also works in browsers over WebRTC. But I'm not using that because I wouldn't know what to do with a git cloned repo inside of a browser tab. Maybe someone else will though. :)
[+] [-] scotchmi_st|10 years ago|reply
[+] [-] johndevor|10 years ago|reply
[+] [-] towelguy|10 years ago|reply
[+] [-] mcdonji|10 years ago|reply
[+] [-] shea256|10 years ago|reply
The post mentions using the blockchain for unique username registration and mapping to public key hashes, and as it turns out there's a project I and others have been working on that does exactly this called Blockstore.
Here's the link if anyone wants to check it out: https://github.com/namesystem/blockstore
The way it works is there's a mapping between a unique name and a hash in the blockchain, and then there's a mapping in a DHT from the hash to the data to be associated (which can be a plain old public key and can also be a JSON file that references a public key and other identity information).
[+] [-] cjbprime|10 years ago|reply
That's great, thanks! I should just use this (preferably with the DHT I'm already using to look up Git commits) instead of reimplementing myself.
What do you think about the idea of making pluggable modules to connect Blockstore with web frameworks (Django, Rails), without the framework/website authors having to get involved in understanding Bitcoin themselves?
[+] [-] dmarti|10 years ago|reply
[+] [-] ackalker|10 years ago|reply
Some Linux distributions have experimented with delta-based package repositories, examples are deltup for Arch Linux and rpm-delta for RPM-based distros. Some of the known issues are:
- choosing the number and spacing between deltas. Fine-grained deltas require more storage space, coarse-grained deltas require more download bandwidth.
- retiring old deltas: periodically deleting all deltas older than a certain version, replacing them with the full package of that version. Again a trade-off between storage space and download bandwidth.
For Git repositories, this would roughly translate to:
- choosing the number, history spacing, and size of the Git packs per repository.
- retiring old Git packs: periodically deleting Git packs older than a particular revision, replacing them with a bare repository at that revision.
[+] [-] pjc50|10 years ago|reply
[+] [-] yellowapple|10 years ago|reply
[+] [-] shmerl|10 years ago|reply
There is GitLab.
[+] [-] sytse|10 years ago|reply
[+] [-] eeZi|10 years ago|reply
[+] [-] doragcoder|10 years ago|reply
[+] [-] davexunit|10 years ago|reply
[+] [-] alganet|10 years ago|reply
[1]: http://fossil-scm.org/index.html/doc/trunk/www/index.wiki
[+] [-] doragcoder|10 years ago|reply
[+] [-] cjbprime|10 years ago|reply
I have a mild bias against altcoins, and have heard bad things about Namecoin in particular: that the anti-spam incentives aren't good, leading to illegal files stored in the blockchain itself, and that there's no compact representation (like Bitcoin's Simplified Payment Verification) for determining whether a claimed name is valid without consulting a full history.
As I understand it, these two design flaws combine to mean that you have to store some very illegal files to use a namecoin resolver, which doesn't sound good to me. (I may be mistaken, since the bad things I heard about Namecoin came from Bitcoin people..)
[+] [-] k2enemy|10 years ago|reply
[0] http://ipfs.io
[+] [-] tperson|10 years ago|reply
[0] http://gateway.ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsN...
[+] [-] innovator116|10 years ago|reply
[+] [-] cjbprime|10 years ago|reply
Yeah! I was a Gitchain backer. The difference is that Gitchain stored the actual git commits in the blockchain, and I leave the actual commits on the hard disks of each BitTorrent seeder.
[+] [-] alexnewman|10 years ago|reply
[+] [-] physcab|10 years ago|reply
[+] [-] durin42|10 years ago|reply
I mean, sorta. It was also because running a service is expensive, and containing abuse is a constant thankless treadmill.
> We’re quickly heading towards a single central service for all of the world’s source code.
Far from it? Not that a fully-decentralized system seems bad, but there are many things that aren't github. I don't even have anything of interest on github.
[+] [-] mrisse|10 years ago|reply
[+] [-] ianphughes|10 years ago|reply
[+] [-] tr4s|10 years ago|reply
[+] [-] decentrality|10 years ago|reply
It is an implementation of a standard without the standard being defined so other implementations can spring up.
Git, one could argue, is language centralized also, which is technically true. That I don't have an answer for. But I don't believe handing off so much dependence to a JavaScript application fits for me.
A C/C++ application like Git I can overlook, at least for a decade or so, but JavaScript feels like a perpetual beta/prototype only. Granted, that's my subjective feeling.
Raised issue: https://github.com/cjb/GitTorrent/issues/12
[+] [-] swombat|10 years ago|reply
What you're really stating, I think, is that this is written in Javascript and you don't like Javascript. That's totally fine, and it's your prerogative. I'm sure that, like any other piece of programming, GitTorrent can be rewritten in other languages. If Javascript bothers you that much, then do this: rewrite it in Ruby if that's what you prefer.
But please stop attacking the claim that this is "decentralised github" by claiming that this "centralises" on Javascript. It doesn't "centralise" on any language. It's just a first implementation written in Javascript.
[+] [-] Skunkleton|10 years ago|reply
[+] [-] comex|10 years ago|reply
If someone builds on this, as discussed elsewhere in the thread, to make a decentralized service that mimics 'social' functionality such as issues and pull requests, I will strongly consider using it instead of GitHub (depending on the UI, stability, etc.).
I don't even have any particularly popular repos, so there is no real reason for anyone to care about the above, but, y'know, HN comments approving of the idea don't necessarily translate into actual interest in the product, so now you know there's at least one person in the latter category. :)