top | item 3407197

Hashify: what becomes possible when one is able to store documents in URLs?

186 points| potomak | 14 years ago |hashify.me | reply

77 comments

order
[+] rsaarelm|14 years ago|reply
I've sometimes wondered about a system where the URL of a document is an actual hash, like SHA-1, of the document. That'd chance the semantics of hyperlinks from "link to document at this internet address" to "link to document with these contents", just like Hashify does, but it could do arbitrarily large documents.

The tricky part with that system would be that you'd also need some new mechanism to retrieve the files. Instead of the regular WWW stack, you'd need something like a massive distributed hash table that could handle massive distributed querying and transferring the hashed files. Many P2P file sharing systems are already doing this, but a sparse collection of end-user machines containing a few hashed files each isn't a very efficient service cloud. If every ISP had this sort of thing in their service stack or if Amazon and Google decided to run the service, all of them dynamically caching documents in greater demand in more nodes, things might look very different.

This would mean that very old hypertext documents would still be trivially readable with working links, as long as a few copies of the page documents were still hashed somewhere, even if the original hosting servers were long gone. It would also make it easy to do distributed page caching, so that pages that get a sudden large influx of traffic wouldn't create massive load on a single server.

On the other hand, any sort of news sites where the contents of the URL are expected to change wouldn't work, nor would URLs expected to point to a latest version of a document instead of the one at the time of linking. Once the hash URL was out, no revision to the hashed document visible from following the URL would be possible without some additional protocol layer. The URL strings would also be opaque to humans and too long and random to be committed to memory or typed by hand. The web would probably need to be somehow split into human-readable URLs for dynamic pages and hash URLs for the static pieces of content served by those pages.

I'm probably reinventing the wheel here, and someone's already worked out a more thought out version of this idea.

[+] eternalban|14 years ago|reply
> I've sometimes wondered about a system where the URL of a document is an actual hash, like SHA-1, of the document

Git.

-

It may be of interest to view this duality as an analog to the duality of location addressing (iterative) vs value addressing (functional) in context of memory mangers. The general (hand wavy as of now) idea is a distributed memory system with a functional front-end (e.g. Scala/Haskell).

[+] juanre|14 years ago|reply
Very neat idea, but I think the reliance in bit.ly is self-defeating. This kind of approach would allow people to distribute documents using the web without having to trust them to a particular server, which can be very convenient if your target audience is in a country where access to the server storing your documents can be closed. For this to work you need to be able to recover the document from the URL locally.

Some years ago a friend and I wrote http://notamap.com, a very similar idea for sharing/storing/embed geotagged notes fully encoded on a URL, without having to rely on a server. Looking at it now I wish we had not put all the crazy animations. Maybe I should recover it and simplify the UI.

[+] rapala|14 years ago|reply
Very neat idea, but I think the reliance in bit.ly is self-defeating. This kind of approach would allow people to distribute documents using the web without having to trust them to a particular server, which can be very convenient if your target audience is in a country where access to the server storing your documents can be closed.

I just can't see the gain here. You need a server to distribute the URLs in any case. You are just moving the data from the server that served the document to the server that servers the URLs. It is still the same data, just in different form.

For this to work you need to be able to recover the document from the URL locally.

How about saving the document?

[+] mmahemoff|14 years ago|reply
I looked at URL shortener limits some time ago and found these approximate limits by trial-and-error:

* TinyURL 65,536 characters and probably more, but requests timed out; there isn’t an explicit limit apparently

* Bit.ly 2000 characters.

* Is.Gd 2000 characters.

* Twurl.nl 255 characters.

This was 2.5 years ago, not sure how much of these have changed (other than bit.ly, which the linked article confirms is 2048, probably the same as when I tested it).

http://softwareas.com/the-url-shortener-as-a-cloud-database

[+] eli|14 years ago|reply
2000 is roughly the maximum length of a URL that IE can handle, incidentally.
[+] mattvot|14 years ago|reply
Boiling it down, it's a new file format with a built in viewer. You need to find a way to store the data.

Interesting, but I can't think of any practical application, apart from the service provider not having to worry about storage (maybe that's key ... more thinking needed).

[+] sgdesign|14 years ago|reply
I really like this approach, in fact that's what I used for http://www.patternify.com/

This way the whole tool can be 100% client-side javascript, without a need for any back-end.

[+] Jare|14 years ago|reply
The original version of Mr Doob's GLSL Sandbox at http://mrdoob.com/projects/glsl_sandbox/ used the same approach, but increased the maximum possible size of the document by doing LZMA compression before base64.

The project later moved to http://glsl.heroku.com/ with an app-driven gallery, and that particular feature went away. I think that is a pretty natural evolution of any such idea, so I'm not convinced of hashify's logevity, but hey, simple sometimes is really enough.

[+] samgranger|14 years ago|reply
Cool to see but stupid idea, who in their right mind would use this for production?! By using such a "technology", you lose SEO strength due to urls-not-being-like-this.html and even worse, what can stop me from publishing a fake press release on there site/spamming porn and getting that URL indexed? And what are the benefits? To also bring SOPA into this, couldn't I share copyrighted material on someone's site like this? How could they control that?! Besides blocking each URL manually. Just seems dumb. As a concept, cool, but for production.... Yikes?!
[+] samgranger|14 years ago|reply
Obviously you could also have a database with all content URL strings that you publish - but that makes this technology worth nothing at all.
[+] Angostura|14 years ago|reply
"Internet Explorer cannot display the webpage" is what happens here (IE 8).
[+] tkellogg|14 years ago|reply
Not really a good reply, but I think that hashify.me's potential for an IE audience was probably small to start with. But consider this: if this idea took off, wouldn't this press MS into keeping IE more modern?
[+] dools|14 years ago|reply
I took a similar approach to this with http://cueyoutube.com and recently found snapbird which gives extended twitter search capabilities. So the URL contains the playlist and twitter becomes the data base, so I just tweet my playlists and they're "saved". You can see all the lists I've created by searching the account iaindooley and search term cueyoutube in snapbird.
[+] cobychapple|14 years ago|reply
What becomes possible? The entire internet could effectively get rid of hosting account providers, with each page in every site being contained in a hashify URL, and with each page linking to other pages using other hashify URLs.

Trouble is, there might be a DNS-like system needed to match hashify URLs to more human-readable strings (or a way for existing DNS to resolve to hashify style URLs).

Neat idea.

[+] vidarh|14 years ago|reply
The data needs to be stored somewhere. In their implementation they in effect use bit.ly as the hosting provider for the data by shortening the url's, so while it's a fun little experiment, it boils down to a content addressable system. We already have good examples of content addressable systems. Git for example is built on content addressable storage.
[+] friggeri|14 years ago|reply
The real trouble is that when you link to a hashified URL, you are actually embedding in your web page (an encoding of) the content of the page you are linking to. Think matryoshka.
[+] seanp2k2|14 years ago|reply
in essence, this would be moving away from a model of "large networks of connected pages/sites" to "a large amount of single documents with no meaningful mechanism of inter-connectedness".

Think about this like a PDF where stuff is embedded instead of in separate files.

[+] feralchimp|14 years ago|reply
Clever? Yes.

But URL shortening services are a public good, and hacking one to be your personal cloud storage platform is kind of a dick move.

[+] jroseattle|14 years ago|reply
This is cool, but I wouldn't use it for any real documents. I care about versioning, edit history, etc.
[+] tony_le_montana|14 years ago|reply
Great idea this. But it saves on each edit and likely to hit rate limit on bit.ly :(
[+] orclev|14 years ago|reply
This is an ancient idea. I read a 2600 article back in the early 2000s or possibly late 1990s that did essentially this same thing using a bash script and one of the first URL shortening services available at the time.
[+] eternalban|14 years ago|reply

   What has been will be again,
   what has been done will be done again;
   there is nothing new under the sun.

   - Ecclesiates 1:9
[+] seanp2k2|14 years ago|reply
...and the general concept of "embedding one type of data inside another" predates computers and modern civilization.
[+] markkum|14 years ago|reply
Check out https://neko.io/ ... we are scrambling/encrypting messages into URLs which you can then share on Facebook, Twitter or where-ever.