top | item 5622209

Immutable URLs

91 points| swombat | 13 years ago |medium.com | reply

45 comments

order
[+] JeremyBanks|13 years ago|reply
Freenet's Content Hash Key URIs are one example of this idea in practice.

https://wiki.freenetproject.org/Content_Hash_Key

BitTorrent's "magnet" URIs could be seen as another. I always liked the idea of using torrents to host static web content. There are downsides, but they would be worth it in many cases.

If you used the torrent info hash as the primary identifier of the web content, but also embedded an HTTP URL that the data could be served directly from, you could have secure immutable content with the almost the same performance as a regular website. The torrent data could be used to verify the HTTP data, and the browser would fall back to download from the torrent network if the website was unavailable or served invalid content.

(This would probably require a bit of original design, since I don't think there's an existing convention for getting the actual torrent data over HTTP instead of from peers (DHT), but that's minor.)

[+] eksith|13 years ago|reply
Yes, the article is basically talking about turning the web into something it currently isn't. In a strange way, this was what the web was when it was very young; a bunch of inter-linked documents written in static HTML that rarely moved around.

But now we have something of a hodgepodge bazaar. For URLs to truly not move around and survive the creator and his/her circumstances, there needs to be a distributed repository. I don't know if Freenet will be that repository (the one time I tried it, it was glacially slow). Maybe Bittorrent's Sync project will pave the way to create a truly universal, persistent, content repository with permanent URI(L)s.

[+] mtrimpe|13 years ago|reply
While magnet URIs come close I think this would actually be a better match for a purely functional data store, like Datomic for example.

If you namespaced each Datomic database and add a transaction ID you would get a reference to an immutable snapshot of that entire data store, including pieces of it, like datomic://myhost:<transaction-id>/path-or-query-into-db

The disadvantage of that is that it is data and not a website, however it's possible to use Functional Reactive Programming to let the site essentially be auto-generated from that data store thus giving you the 'website view' again.

That of course still allows your program to be lost, but if you were to add that program to that purely functional data store itself and thus also version your own program, then that is also no longer a problem.

And once you've done that call me, since you'll have built what I've been dreaming of for the past decade.

[+] herghost|13 years ago|reply
This cuts against the nature of real life.

I regularly speak with groups of high school pupils about privacy and one of my main points to them is that once they commit their latest brain-fart to the internet, there is a very real chance that it becomes immutable - should it go viral, for instance. If it were absolutely guaranteed that it would become immutable though, that would be a game changer.

Can you imagine if everything that you had ever said at any point in your life was permanently journaled and indexed and searchable? I personally find that to be a horrific concept from a privacy point of view.

From a purely technical point of view I can see the benefit of this idea - I hate it when an old article that I've bookmarked doesn't exist anymore (even when by article I mean, a gif that made me chuckle) but seriously, there's a very different world to a newspaper article being permanently available and a myspace profile, Facebook post or tweet being there forever.

[+] gojomo|13 years ago|reply
I wonder if it is better to warn the young about the potential permanence of online expression... or let them take those risks, and then as they grow, manage the world that results.

They might negotiate new norms, of forgiveness and understanding towards prior selves.

[+] Osmium|13 years ago|reply
Sounds lovely in theory...

I have to say, a service I would love would be a website that mirrors content but only if the original source went down, and otherwise redirects to the original site. Imagine something like a URL shortener, but the URL it gives you will redirect to a cached copy of the original page in the event that the original page disappears. That way, you can link and give credit to the people who made the content, but if something happens it isn't lost from the internet for good. It would, in a sense, be a "permanent URL" service. It'd be great for citations too, e.g. wikipedia, academia, etc. I'm not sure if that's what the OP is getting at here, or if he's suggesting something else?

Either way, too bad rights issues would probably stop something like that ever being made.

[+] spc476|13 years ago|reply
As you say, rights issues would probably stop something like this, and I have a few stories that show two sides of the rights issues.

1) Back in 1998, the company hosting my website received a cease and desist letter from a company that held the "Welcome Wagon(TM)" trademark because of a page I had on my website. That prompted me to get my own domain and move the content over (and I was able to get proper redirects installed on the company webserver). I was happy (I had my own domain, a ".org" and apparently, that was enough to keep the lawyers at bay). The hosting company was happy (they didn't have to deal with the cease and desist letter) and the trademark holding company was happy (they protected their trademark like they're legally required to). I'm sure that the trademark company would be upset if their trademark was still "in use" at [redacted].com (the hosting company, long gone by now).

2) I hosted a friend's blog on my server. A few months later he asked me to take the blog down, for both personal and possibly legal reasons (he was afraid of litigation from his employer, who had a known history of suing employees, but that's not my story to tell). I'm sure he would be upset (and potentially a lot poorer) had his content remained online for all to see.

3) I've received two requests to remove information on my blog. The first time (http://boston.conman.org/2001/08/22.2) someone didn't quite grasp the concept that domain name registration information is public, but I didn't feel like fighting someone who's grasp of English wasn't that great to begin with, and removed the information. The second time (http://boston.conman.org/2001/11/30.1) was due to a mistake, so I blacked out identifying information. I didn't want to remove the page, because, you know, cool URLs don't change (http://www.w3.org/Provider/Style/URI.html); yet the incident was a mistake. There's no real point in seeing the non-redacted version, nor do I really want people to see the non-redacted version.

There are a ton of corner-cases like these to contend with. Just one reason why Ted Nelson's version of hypertext never got off the ground.

[+] clarkm|13 years ago|reply
W3C's Permanent Identifier Community Group maintains https://w3id.org/ which performs a similar service.
[+] n0nick|13 years ago|reply
I really like your idea, but what about content being changed/updated, instead of deleted?

For some use cases it would make sense to show the cache (when the original quote is no longer there), while for other it'd make sense to forward (some style update, or an important addition).

How do you think can such service handle this?

[+] manmal|13 years ago|reply
I think you can achieve that effect with auto-scaling, for example on Elastic Beanstalk. If AWS goes down though, that won't help (but most probably such a service would run on AWS anyway :)).
[+] brc|13 years ago|reply
See webcitation.org - copies the page at time of linking.
[+] skrause|13 years ago|reply
Thinking of the first content I published on the web as a teenager some 15 years ago, I'm happy it's gone now.
[+] Teapot|13 years ago|reply
Sure. But wait, you might change your mind in the future. Nostalgia perhaps.
[+] edwintorok|13 years ago|reply
"A cool URI is one which does not change." http://www.w3.org/Provider/Style/URI
[+] jackalope|13 years ago|reply
Once upon a time, I drank this koolaid, but no more. Many things in life are ephemeral, including information. To suggest that a webmaster's responsibility is to hoard data for eternity is both scatological and counterproductive. As the Web matures, it is threatened far more by the growing mountain of obsolete information that must be ignored in order to find anything timely and relevant. I would much rather see these pages deleted if they aren't going to be updated, even if it means broken URIs, which will eventually fade away.
[+] hobs|13 years ago|reply
This is basically the idea that Julian Assange was putting forth in that article a few weeks ago about his secret meeting with Larry Paige.

Its interesting to see that people are already saying this is a bad idea but was praising his version of it.

[+] yottabyte47|13 years ago|reply
Seems like a bad idea. People think that once something is on the internet it's there forever but that's simply not the case. Hard drives develop errors, servers get shut down, backups get corrupted, etc. etc. Your stuff may be around for a long while but there's no guarantee that it will be permanently accessible. If you want the contents of a web page to be available to you then download said page to your computer and do proper backups, etc. This will increase the likelihood that said data will survive. This is not a problem with URLs.
[+] chewxy|13 years ago|reply
Didn't Julian Assange suggest this? There was an interview published last week with Eric Schmidt, where this was suggested.

I've since started work on a side project that does this - to be integrated into Fork the Cookbook - since our target audience seems to be very up-in-arms about original recipes.

[+] gphil|13 years ago|reply
I've had this thought before but it seems like the natural key for a web resource has to be the URL (location) plus the time that the resource was accessed for practical reasons.

Pages are expected by the end user to change over time, but they also expect to access them at the same location each time.

[+] derefr|13 years ago|reply
> I've had this thought before but it seems like the natural key for a web resource has to be the URL (location) plus the time that the resource was accessed for practical reasons.

These are called Dated URIs/DURIs: http://tools.ietf.org/html/draft-masinter-dated-uri-10

No browser currently implements them, but a viable resolution mechanism probably involves keeping a default store of Memento Time-Gates (http://www.mementoweb.org/guide/quick-intro/) and querying them to see if any of them have a copy of your resource for that date.

[+] bowietrousers|13 years ago|reply
This is a cute, facile idea, but not thought through. It's not a problem of technology per se - content itself doesn't want or need to live forever. I reserve the right to alter or remove content that I publish.

It's trendy to think of the web as completely stateless, distributed etc, but the reality is that it's not. The state of resources changes over time because the world changes - and URIs are only around to reflect that.

The problem with HTTP is that you mostly can't tell the difference with a 404 between 'It's not there (and was never there)' and 'It's not there (but used to be, and has gone away)'. Servers should send a 410 to reflect that.

[+] felipelalli|13 years ago|reply
After that I thought: I have to reinvent the web someday. Another engine, another software, not even called "web". The web structure is so old-fashion. Did you already think about how much different a page is from each other? It is bad for the final user! Think again: Android, Windows, Mac, the SO usually try to make a standard to help user don't think again to make repetitive tasks. Different layouts makes an unnecessary brain effort. I know that this is the beauty anarchy from web, but it is not practical. It is possible to be beautiful and follow minimum standards. iPhone is there to prove that.
[+] adregan|13 years ago|reply
It's very interesting that an immutable web could make current real world immutable objects (printed books, etc.) appear more flexible, more mutable. With a book, you can write in the margins of a particular copy, every copy could be lost, but a distributed system of permanent content would persist without marginalia or utter destruction.

The web is amazing because of its participatory potential and it's archival abilities. What might be more interesting than simply having immutable content is palimpsested content where the original object always exists beneath additional changes and additions.

[+] alberth|13 years ago|reply
"Immutable URLs" already exist, they're called URNs [1] and it's a standard since 1997.

[1] http://en.wikipedia.org/wiki/Uniform_Resource_Name

[+] vy8vWJlco|13 years ago|reply
I think people are re-discovering/re-inventing Berners-Lee's semantic web and "linked data" descriptions (having the epiphany on their own) in part because Berners-Lee, for all his excitement, often fails at presenting the very basic idea, and I think it's because he takes the idea of a static URI for granted.

By using a URI like a globally-unique primary key - a symbolic link - into "the database of the web," in place of the content itself (not just as a pointer to a the next page page with cats) you can begin to use all of the web as the data set and something like XPath/XQuery as the query language.

Before any of that can happen, people need to really accept that URIs/URLs can't change their semantic content and rarely-if-ever go away. That's a big problem with the current approaches to displaying content: the references they generate are presumed to be forgettable.

[+] dvanduzer|13 years ago|reply
A URN is still a pointer. An ISBN is not the book.
[+] alexpopescu|13 years ago|reply
URIs are pretty much immutable. My impression is that what the OP suggests is a guaranteed lifetime of the content associated with the URI.

As for this second part, "once it's published it should always remain out there", I'm not very sure it's a good idea. In many cases I'd actually like to be able to say that a piece of content has expired (the content is not relevant anymore).

[+] aklemm|13 years ago|reply
If there aren't enough interested readers to encourage a maintainer to keep the content available, is it really worth the effort to auto-archive all of it?