top | item 15538535

How Merkle Trees Enable the Decentralized Web

300 points| EGreg | 8 years ago |taravancil.com

52 comments

order
[+] basemi|8 years ago|reply
A similar concept (Merkle-DAG) is implemented by IPFS (ipfs.io)

"A Merkle-DAG is similar to a Merkle tree in that they both are essentially a tree of hashes. A Merkle tree connects transactions by sequence, but a Merkle-DAG connects transactions by hashes. In a Merkle-DAG, addresses are represented by a Merkle hash. This spider web of Merkle hashes links data addresses together by a Merkle graph. The directed acyclic graph (DAG) is used to model information. In our case, modeling what address has stored specific data. "

(https://www.cio.com/article/3174193/healthcare/from-medical-...)

[+] EGreg|8 years ago|reply
Why can't there be cycles in the Merkle graph

That would be cool. You just have a partial order between children and parents, like in git.

[+] dracodoc|8 years ago|reply
To rephrase my comment:

Some disadvantage of OP's method:

- hash code is difficult to read for human. Urls under some hierarchy share some common patterns, it also bear some meanings. All hash code will look same and have nothing to hint on the content.

- You have to copy it, almost impossible to type it, or compare two visually similar string.

- You may end up with some link shortening service for hash code, but you can use link shortening to solve the portable file host problem already.

The Merkle Trees can solve some problems, but I don't think portable urls are the right one.

[+] subless|8 years ago|reply
I agree 100%. Also, Merkle Trees would only benefit static content uploaded to the Internet. Updating any dynamic content would constantly generate a new root hash meaning a new URL with each update to that specific content. It's just not a good option for anything other than static content in general.

One place I could see it being valuable is with an online archiving type service like Archive.org where the content doesn't change except when a new snapshot of that content is recorded displaying any changes made to that content.

[+] indigochill|8 years ago|reply
From a dumb layman's perspective (mine), it seems as though the addressing issue has a very simple solution: to refer to some arbitrary file which is hosted by some known third-party (say, a video on YouTube), provide a link to it from your server and distribute the link to that link to others. When switching from YouTube to Vimeo, simply change the one link on your server and the link you distributed remains valid.

This also avoids the danger of hash collisions, although it is vulnerable to third-party hosts taking down the content (although again, there's not much cost to that. Just move the content elsewhere and change your redirect link).

With this method you do still have some server responsible for hosting the content rather than distributing that load, but that's actually a saving when looking at the storage load across the entire ecosystem.

It's a cool article and I really like the idea of content addressing instead of host addressing. I just feel like it's too divergent from the web as most people use it today (and I'm mildly concerned about malicious hash shenanigans), whereas the above method can be used and understood right now by many web users.

[+] ENGNR|8 years ago|reply
CDN's have made hosting pretty easy to scale from a low powered controller, the hard problem is working out when to invalidate the cache. Which is easily solved by.... another layer of merkle trees actually! It's Angela Merkel's all the way down
[+] dracodoc|8 years ago|reply
Or you can go one step further, use third party server to provide the link, because it's a burden to maintain your own server for many people. "Your server" is most likely somebody else's server anyway.

That will be google voice for url. The OP's method is use people's name as phone number.

Another problem with OP's method is that it's difficult to read for human, and very difficult to verify by eye. There is also no common parts for files organized in same hierarchy.

[+] api|8 years ago|reply
Hash collisions are really a non-issue. A collision in SHA-256 or larger spaces is less likely than being, say, struck by a meteorite twice in one day. When you get up to 384 and 512 bit hashes you could content address the entire Internet for thousands of years and likely never have a collision.
[+] abrax3141|8 years ago|reply
It’s not so much the M-trees that make this work, as the idea of distributing the leaves around so that you can’t efficiently corrupt the tree by replacing it. If it was all in one machine you could just fake the tree. According to the original bitcoin paper, this idea came from Haber and Stornetta. (I have a personal theory that Stornetta is Satoshi ... who doesn’t reference their own papers multiple times?!)
[+] yosito|8 years ago|reply
Can someone ELI5 what happens when I want to change a piece of data in a tree? What if I have a document, and I notice a typo, and I want to fix it?
[+] nattmat|8 years ago|reply
By using an append-only Merkle tree, you make 'edits' by adding a new piece of data with information on what was changed in the older data. You get the benefit of also keeping a version history. I think Git works like this.
[+] toomim|8 years ago|reply
Exactly. Then merkle trees don't help you. They only work for static documents.
[+] skate22|8 years ago|reply
Good article, but there are a lot of claims of '100% certainty' that arn't necessarily true. The author even states that hash functions only guarentee no collisions to a high probability.
[+] DennisP|8 years ago|reply
It looks like there are about as many Earth-like planets in the universe as grains of sand on the Earth. Write your name on a grain of sand on one of those planets. Now have someone else randomly pick a single grain of sand from some planet in the universe. How certain are you that they won't pick yours?

The chance of that happening is roughly equal to the chance of a collision randomly occurring somewhere in a few quadrillion SHA256 hashes.

https://crypto.stackexchange.com/questions/52261/birthday-at...

http://www.npr.org/sections/krulwich/2012/09/17/161096233/wh...

https://www.cnet.com/news/the-milky-way-is-flush-with-habita...

[+] Blahah|8 years ago|reply
100% certainty is correct to integer rounding. It's correct to many significant figures too.
[+] peterwwillis|8 years ago|reply
> The Web is centralized,

It isn't.

> but why?

"The web" is a collection of independent networks providing a means to traverse hyperlinked text documents across any network that is addressable.

Not only are servers not a problem, host-based addressing is what makes the web work at all, as numerical addressing by itself would have killed the web's growth long ago, and users would have no reasonable way to address content.

[+] pfraze|8 years ago|reply
You're ignoring the data silo problem. Of course the Web is decentralized in terms of accessing web pages, but-- the majority of data is not composed of pages, and the majority of actions on the Web are not composed of hyperlink traversals. For the applications on the Web to be decentralized, the data and behaviors have to be decentralized as well, and they are not.