top | item 18711980

Kademlia: A Peer-To-peer Information System Based on the XOR Metric (2002) [pdf]

191 points| liamzebedee | 7 years ago |pdos.csail.mit.edu | reply

40 comments

order
[+] bcaa7f3a8bbc|7 years ago|reply
I'm familiar to this paper because I was using eMule. eMule has a direct citation to this paper in its "help" window, otherwise, the decentralization of eD2k would not be possible. The protocol is old, but still impressive.

The year 2000 is special, within the same year, Gnutella, eDonkey and Freenet were all released its initial version, as if the P2P network "emerged" from the Internet. By 2004, eMule, I2P and BitTorrent (and BTW, Tor) were all here. At the very beginning, the first generation of P2P, Gnutella was suffering from its scalability problem - its simple unstructured, flooding protocol needs a number of O(n) queries, and becomes unstable once the users exceeded millions.

Then, Kademlia almost saved P2P. BitTorrent, I2P, eD2k and many other P2P protocols are all Kad-based. For example, in eD2k, Kad allowed it to have P2P resource discovery, and also an efficient decentralized keyboard search engine (although there is spam, but mostly works, and in BitTorrent we still don't have it), all with only a O(log(n)) query, The P2P mechanism of I2P is also exclusively based on Kad.

The authors of this paper are the few people who have made great contributions to the Internet.

[+] nostrademons|7 years ago|reply
I love how people are rediscovering the previous generation of P2P software from 15-20 years ago. Reading through these papers was a huge part of my college procrastination; maybe now that cryptocurrencies give a way to provide economic incentives to those who contribute computing power, we might see a renaissance of some of these older ideas.
[+] jamesblonde|7 years ago|reply
Interestingly, the original Kademlia paper did not make it into any of the big p2p conferences at the time. It was published at some 2nd tier p2p workshop. It was only when the system gained actual adoption, that people referenced the paper.

In contrast, Chord, which was published 1 year earlier in 2001 is one of the most cited papers in computer science. However, Chord was not practical. It assumed the availability of bi-directional connections between any two hosts. However, NATs had just started to appear, which broke Chord. Kademlia, which does not require peers to maintain a well-defined routing structure (the ring in Chord) can muddle through with NATs - because the routing table is a huge messy tree and if any branches lead you to the destination, it will kind of work.

[+] pjkundert|7 years ago|reply
It is a key part of the implementation of Holochain (https://holo.host).

It turns out that global consensus is not required to maintain accounting balance in a global-scale ledger. DHTs and their validation of the data they host is a big part of maintaining the invariant of the system.

[+] gst|7 years ago|reply
Kademlia is still in widespread use today. For example Bittorrent uses it to resolve magnet links.
[+] squarefoot|7 years ago|reply
I still miss the first Napster and later the OpenNap network with its OSS clients such as Lopster. Way flawed and not scalable at all compared to BitTorrent, but it allowed 1 to 1 connections between people, so that one day I could browse a random guy shared hard drive because he had some interesting music, see a band named "Spock's Beard" and being a trekkie myself, albeit a moderate one, I simply had to hear what they sounded like, and after an eternity due to then slow connections, thanks to other bands discovered the same way, I realized that progressive rock was (and still is) even more alive today than in the 70s, although obviously not mainstream. So a huge thankyou to the old flawed and slow Napster as well.
[+] eeZah7Ux|7 years ago|reply
Kademlia is also an interesting option for a small and resilient key-value store between a handful of hosts.
[+] jMyles|7 years ago|reply
Kademlia is awesome and is, of course, the basis of libp2p-kad-dht and the DevP2P project that many decentralized projects are using at the moment.

At NuCypher, we found that, all the way to the core of Kademlia, there were some "inter-planetary" assumptions that, while both mindblowing and fit for many use-cases, didn't fit our "intra-planetary" distributed notion of network topology.

If Kademlia interests you, you might also be interested in this overview of what we did and why:

https://blog.nucypher.com/why-were-rolling-our-own-intra-pla...

[+] mohitmun|7 years ago|reply
Huge fan of Nucypher. Recently At one hackathon, I used your proxy re-encryption to build decentralised KYC system. Amazing work you guys are doing!
[+] jondubois|7 years ago|reply
A lot of cryptocurrency projects are using Kademlia but it's prone to vulnerabilities such as eclipse attacks. Because peer connections are made according to deterministic rules (e.g. often based on unique Node IDs), the topology of the network can be analyzed and then the information can be exploited to disrupt the network.

There are solutions which involve making it hard to generate IDs (I've seen it done with node IDs generated using hash-based proof of work) but nowadays with all the cheap ASIC miners, you could compute these hashes much faster than any regular computer could; this means that node IDs need to be generated using top-of-the-range ASICs in order to make the network truly secure; this adds a lot of complexity when it comes to deploying a node. I don't believe that any of the existing Kademlia-based cryptocurrencies currently require sufficiently complex node IDs.

[+] tombert|7 years ago|reply
I wrote a video-sharing thing during a hackathon based on Kademlia. It's one of my favorite algorithms and I found the paper to be easier to read than most papers based on algorithms. Here's a video of a talk I gave on it:

https://www.youtube.com/watch?v=byIFT8OzfX0

[+] woadwarrior01|7 years ago|reply
Over a decade ago, I was a part of a small team at a startup that built a p2p messenger called cSpace[1], I fondly remember reading the paper and implementing a Kademlia DHT for it. IIRC, the bittorrent DHT was the only open source Python implementation of it that we could find, it was using Twisted for networking and quite complicated, we decided to keep things simple and rewrite it ourselves. The resulting implementation[2] is quite simple and elegant, IMO.

[1]: https://github.com/jmcvetta/cspace (The only copy of it, I could find online) [2]: https://github.com/jmcvetta/cspace/tree/master/cspace/dht

[+] loeg|7 years ago|reply
(2007), if not earlier, FWIW. I used this protocol in a college final project ("make a botnet") years ago :-).
[+] miguelrochefort|7 years ago|reply
Has any progress in the P2P a been made since Kademlia?

I've been searching for a while, for a DHT (actually a distributed graph database) that self optimizes based on usage. Basically, I want data that's frequently accessed together to get close to eachother and to nodes that frequently access it.

[+] xj9|7 years ago|reply
in addition to a DHT, you need another layer that can map key names to their hashes. ideally this is also something you can store in DHT (using ipns / mutable keys). once you have a (distributed) key-value store, you can build pretty much any kind of database[1]. this is known to work well for hadoop-style workloads[2] and i'd expect you could get similar performance for a p2p graph db.

[1]: https://apple.github.io/foundationdb/layer-concept.html

[2]: https://news.ycombinator.com/item?id=15543596

[+] new12345|7 years ago|reply
I don't understand inspite of having working p2p protocol why we don't yet have an good p2p alternative to dropbox that just works without any setup hassles. Closest working thing I know of is BitSync but that too requires an mediator server.
[+] yholio|7 years ago|reply
Because Dropbox is a file storage service, not a file distribution system. The typical use case is that an uploaded large file will be downloaded a single time by someone else, at different times when computers are online.

This requires an intermediary system with very high availability and performance, which in P2P can only be implemented by duplicating data on a dozen or so systems.

Since files must be private and there is no commonality like in BitTorrent, it follows that running a P2P Dropbox would consume 5-10 times the storage and bandwidth of it's centralized counterpart for the same useful service.

[+] amitport|7 years ago|reply
IMO. The p2p business model is problematic in this case.

1 People need to incentives to join a p2p network and certainly to contribute storage+bandwidth

1* Reliably and privacy issues makes this a bigger problem (though p2p may also be the solution, it's not easily achievable)

2 This can't easily happen without investment. Marketing + setup without "hassles" is not cheap

2* Given investment, what are the assets of the investors?

[+] e12e|7 years ago|reply
Ipfs doesn't take much setup, but it's global sharing - so you'll have to bring your own access control (eg: encrypt the files, and use à key distribution/sharing scheme of some sort).
[+] jonathanstrange|7 years ago|reply
I'm out of the loop for a few years now, but isn't NAT hole punching still one of the biggest problems? AFAIK, there is still no reliable solution that works without an accessible 3rd party server.
[+] dmourati|7 years ago|reply
Attended a "Papers we Love" presentation on this paper last year.