Hashify: what becomes possible when one is able to store documents in URLs?

[+] rsaarelm|14 years ago|reply

I've sometimes wondered about a system where the URL of a document is an actual hash, like SHA-1, of the document. That'd chance the semantics of hyperlinks from "link to document at this internet address" to "link to document with these contents", just like Hashify does, but it could do arbitrarily large documents.

The tricky part with that system would be that you'd also need some new mechanism to retrieve the files. Instead of the regular WWW stack, you'd need something like a massive distributed hash table that could handle massive distributed querying and transferring the hashed files. Many P2P file sharing systems are already doing this, but a sparse collection of end-user machines containing a few hashed files each isn't a very efficient service cloud. If every ISP had this sort of thing in their service stack or if Amazon and Google decided to run the service, all of them dynamically caching documents in greater demand in more nodes, things might look very different.

This would mean that very old hypertext documents would still be trivially readable with working links, as long as a few copies of the page documents were still hashed somewhere, even if the original hosting servers were long gone. It would also make it easy to do distributed page caching, so that pages that get a sudden large influx of traffic wouldn't create massive load on a single server.

On the other hand, any sort of news sites where the contents of the URL are expected to change wouldn't work, nor would URLs expected to point to a latest version of a document instead of the one at the time of linking. Once the hash URL was out, no revision to the hashed document visible from following the URL would be possible without some additional protocol layer. The URL strings would also be opaque to humans and too long and random to be committed to memory or typed by hand. The web would probably need to be somehow split into human-readable URLs for dynamic pages and hash URLs for the static pieces of content served by those pages.

I'm probably reinventing the wheel here, and someone's already worked out a more thought out version of this idea.

[+] sp332|14 years ago|reply

I think Freenet already does this https://en.wikipedia.org/wiki/Freenet#Keys Edit: I should point out that it's a separate network from "the web".

[+] eternalban|14 years ago|reply

> I've sometimes wondered about a system where the URL of a document is an actual hash, like SHA-1, of the document

Git.

-

It may be of interest to view this duality as an analog to the duality of location addressing (iterative) vs value addressing (functional) in context of memory mangers. The general (hand wavy as of now) idea is a distributed memory system with a functional front-end (e.g. Scala/Haskell).

[+] ivank|14 years ago|reply

http://www.ccnx.org/about/

"A New Way to look at Networking" http://www.youtube.com/watch?v=8Z685OF-PS8

[+] mhitza|14 years ago|reply

http://en.wikipedia.org/wiki/Uniform_resource_name

[+] antimatter15|14 years ago|reply

The creator of Freenet made something called dijjer, which mirrors http files in a p2p network accessible by prepending http://dijjer.org/get/ to it. But it looks like he's no longer maintaining it.

http://code.google.com/p/dijjer/

[+] pixelcort|14 years ago|reply

Check out http://en.wikipedia.org/wiki/Magnet_URI_scheme

In a single link for a file, it can contain multiple hashes for multiple means of retrieval.

[+] juanre|14 years ago|reply

Very neat idea, but I think the reliance in bit.ly is self-defeating. This kind of approach would allow people to distribute documents using the web without having to trust them to a particular server, which can be very convenient if your target audience is in a country where access to the server storing your documents can be closed. For this to work you need to be able to recover the document from the URL locally.

Some years ago a friend and I wrote http://notamap.com, a very similar idea for sharing/storing/embed geotagged notes fully encoded on a URL, without having to rely on a server. Looking at it now I wish we had not put all the crazy animations. Maybe I should recover it and simplify the UI.

[+] rapala|14 years ago|reply

Very neat idea, but I think the reliance in bit.ly is self-defeating. This kind of approach would allow people to distribute documents using the web without having to trust them to a particular server, which can be very convenient if your target audience is in a country where access to the server storing your documents can be closed.

I just can't see the gain here. You need a server to distribute the URLs in any case. You are just moving the data from the server that served the document to the server that servers the URLs. It is still the same data, just in different form.

For this to work you need to be able to recover the document from the URL locally.

How about saving the document?

[+] aurelianito|14 years ago|reply

I don't get it. Why do they claim that this is in any way better than a data:// url? (http://es.wikipedia.org/wiki/Data:_URL)

[+] sp332|14 years ago|reply

This lets you use the data as a piece of a URL, so you can pass it as a CGI query string to another web page.

[+] ams6110|14 years ago|reply

In English: http://en.wikipedia.org/wiki/Data_URI_scheme

[+] mmahemoff|14 years ago|reply

I looked at URL shortener limits some time ago and found these approximate limits by trial-and-error:

* TinyURL 65,536 characters and probably more, but requests timed out; there isn’t an explicit limit apparently

* Bit.ly 2000 characters.

* Is.Gd 2000 characters.

* Twurl.nl 255 characters.

This was 2.5 years ago, not sure how much of these have changed (other than bit.ly, which the linked article confirms is 2048, probably the same as when I tested it).

http://softwareas.com/the-url-shortener-as-a-cloud-database

[+] eli|14 years ago|reply

2000 is roughly the maximum length of a URL that IE can handle, incidentally.

[+] choffstein|14 years ago|reply

Repost: http://news.ycombinator.com/item?id=2464213

[+] mattvot|14 years ago|reply

Boiling it down, it's a new file format with a built in viewer. You need to find a way to store the data.

Interesting, but I can't think of any practical application, apart from the service provider not having to worry about storage (maybe that's key ... more thinking needed).

[+] irrumator|14 years ago|reply

It would be except it's not new, it's a dupe of post from not too long ago that had a large amount of points: http://news.ycombinator.com/item?id=2464213

[+] sgdesign|14 years ago|reply

I really like this approach, in fact that's what I used for http://www.patternify.com/

This way the whole tool can be 100% client-side javascript, without a need for any back-end.

[+] Jare|14 years ago|reply

The original version of Mr Doob's GLSL Sandbox at http://mrdoob.com/projects/glsl_sandbox/ used the same approach, but increased the maximum possible size of the document by doing LZMA compression before base64.

The project later moved to http://glsl.heroku.com/ with an app-driven gallery, and that particular feature went away. I think that is a pretty natural evolution of any such idea, so I'm not convinced of hashify's logevity, but hey, simple sometimes is really enough.

[+] samgranger|14 years ago|reply

Cool to see but stupid idea, who in their right mind would use this for production?! By using such a "technology", you lose SEO strength due to urls-not-being-like-this.html and even worse, what can stop me from publishing a fake press release on there site/spamming porn and getting that URL indexed? And what are the benefits? To also bring SOPA into this, couldn't I share copyrighted material on someone's site like this? How could they control that?! Besides blocking each URL manually. Just seems dumb. As a concept, cool, but for production.... Yikes?!

[+] samgranger|14 years ago|reply

Obviously you could also have a database with all content URL strings that you publish - but that makes this technology worth nothing at all.

[+] Angostura|14 years ago|reply

"Internet Explorer cannot display the webpage" is what happens here (IE 8).

[+] unknown|14 years ago|reply

[deleted]

[+] tkellogg|14 years ago|reply

Not really a good reply, but I think that hashify.me's potential for an IE audience was probably small to start with. But consider this: if this idea took off, wouldn't this press MS into keeping IE more modern?

[+] dools|14 years ago|reply

I took a similar approach to this with http://cueyoutube.com and recently found snapbird which gives extended twitter search capabilities. So the URL contains the playlist and twitter becomes the data base, so I just tweet my playlists and they're "saved". You can see all the lists I've created by searching the account iaindooley and search term cueyoutube in snapbird.

[+] unknown|14 years ago|reply

[deleted]

[+] cobychapple|14 years ago|reply

What becomes possible? The entire internet could effectively get rid of hosting account providers, with each page in every site being contained in a hashify URL, and with each page linking to other pages using other hashify URLs.

Trouble is, there might be a DNS-like system needed to match hashify URLs to more human-readable strings (or a way for existing DNS to resolve to hashify style URLs).

Neat idea.

[+] vidarh|14 years ago|reply

The data needs to be stored somewhere. In their implementation they in effect use bit.ly as the hosting provider for the data by shortening the url's, so while it's a fun little experiment, it boils down to a content addressable system. We already have good examples of content addressable systems. Git for example is built on content addressable storage.

[+] friggeri|14 years ago|reply

The real trouble is that when you link to a hashified URL, you are actually embedding in your web page (an encoding of) the content of the page you are linking to. Think matryoshka.

[+] seanp2k2|14 years ago|reply

in essence, this would be moving away from a model of "large networks of connected pages/sites" to "a large amount of single documents with no meaningful mechanism of inter-connectedness".

Think about this like a PDF where stuff is embedded instead of in separate files.

[+] feralchimp|14 years ago|reply

Clever? Yes.

But URL shortening services are a public good, and hacking one to be your personal cloud storage platform is kind of a dick move.

[+] jheriko|14 years ago|reply

agreed

[+] jroseattle|14 years ago|reply

This is cool, but I wouldn't use it for any real documents. I care about versioning, edit history, etc.

[+] tony_le_montana|14 years ago|reply

Great idea this. But it saves on each edit and likely to hit rate limit on bit.ly :(

[+] orclev|14 years ago|reply

This is an ancient idea. I read a 2600 article back in the early 2000s or possibly late 1990s that did essentially this same thing using a bash script and one of the first URL shortening services available at the time.

[+] eternalban|14 years ago|reply

   What has been will be again,
   what has been done will be done again;
   there is nothing new under the sun.

   - Ecclesiates 1:9

[+] seanp2k2|14 years ago|reply

...and the general concept of "embedding one type of data inside another" predates computers and modern civilization.

[+] unknown|14 years ago|reply

[deleted]

[+] markkum|14 years ago|reply

Check out https://neko.io/ ... we are scrambling/encrypting messages into URLs which you can then share on Facebook, Twitter or where-ever.

[+] djbender|14 years ago|reply

Older Hacker News Post: http://news.ycombinator.com/item?id=2464213

77 comments