Cool URIs don't change (1998)

[+] Communitivity|10 years ago|reply

It is confusing to a lot of people, but they aren't functionally interchangeable.

Basically you have Uniform Resource Locators (URLs), Uniform Resource Names (URNs), and Uniform Resource Identifiers (URIs). You also have International Resource Identifiers (IRIs), which are URIs with rules allowing for international character sets in things like host names.

Every URN and URL is a URI. However, not every URI is a URN, or a URL.

A URN has a specific scheme (the front part of a URI before the :), but it does not contain instructions on how to access the identified resource. We humans might automatically map that to an access method in our head (e.g., digital object identifier URNs like doi:10.1000/182, which we who have used DOIs know maps to http://dx.doi.org/10.1000/182), but the instruction isn't in the URN.

A URL is not just an identifier but also an instruction for how to find and access the identified resource.

For example http://example.org/foo.html says to access the web resource /foo.html by using the HTTP protocol over TCP to connect to IP address which example.org resolves to, on port 80.

An example of URIs which are not URLs are the MIME content ids used to mark the boundaries within an email (cid scheme), e.g., cid:foo4%[email protected].

You can get more information at: https://tools.ietf.org/html/rfc2392

[+] andreasvc|10 years ago|reply

IMHO the distinction between URL and URI is similar to the debate on SI prefixes for bytes, or whether we should insist on calling Linux GNU/Linux; i.e., most people just don't care enough so these things will never gain currency.

[+] makecheck|10 years ago|reply

A lot of missteps in the early days of web technologies have made stable URLs impractical, unfortunately.

One problem is that someone decided to include file name extensions. Maybe this happened naturally because web servers made it so easy to expose entire directory structures to the web. And yet, this continues to be used for lots of other things. It is so ridiculous that a ".asp" or ".php" or ".cgi" causes every link, everywhere to depend on your arbitrary implementation details!

Another problem is that many software stacks are just not using brains when it comes to what would make a useful URL. Years ago I was very frustrated working with an enterprise software company that wanted to sell us a bug-tracking system and they didn’t have simple things like "server.net/123456" to access bug #123456; instead, the URL was something absolutely heinous that wouldn’t even fit on a single line (causing wrapping in E-mails and such).

Speaking of E-mail, I have received many E-mails over time that consisted of like TWELVE steps to instruct people on how to reach a file on the web. The entire concept of having a simple, descriptive and stable URL was completely lost on these people. It was always: 1. go to home page, 2. click here, ..., 11. click on annoying “content management system” with non-standard UI that generates unbookmarkable link, 12. access document. These utterly broken systems began to proliferate and it rapidly reached the point where most of the content that mattered (at least inside companies) was not available in any sane way so deep-linking to URLs became pointless.

[+] zymhan|10 years ago|reply

" There is nothing about HTTP which makes your URIs unstable. It is your organization. "

I think this could be applied to more than just how companies manage URLs.

Also, I'm trying to find a post I recently read that talked about how calling URLs "URI"s is just confusing nowadays since almost everyone still only knows the term URL, and they're functionally interchangeable.

[+] kfullert|10 years ago|reply

Was it https://news.ycombinator.com/item?id=11673058 (My url isn't your url) ?

[+] unknown|10 years ago|reply

[deleted]

[+] pidg|10 years ago|reply

Another addition to their 'Hall of Flame' might be the British Monarchy. A couple of weeks ago, they broke every existing URI when they moved from www.royal.gov.uk to www.royal.uk. Every URL from the old domain gets redirected to the root of the new site.

https://www.google.co.uk/search?q=site%3Aroyal.gov.uk

[+] iand|10 years ago|reply

You'll be pleased to learn that the National Archives keeps an archive of all UK government sites, e.g.

http://webarchive.nationalarchives.gov.uk/20130403203037/htt...

See http://nationalarchives.gov.uk/webarchive/

[+] _puk|10 years ago|reply

Never fear, the Daily Mail will surely make a call to oust the Monarchy on this basis..

You'd think ICANN would have a .monarchy TLD by now

mailto:[email protected]

[+] chias|10 years ago|reply

Interesting, the link as posted here violates the guidelines. Perhaps you meant to link to

https://www.w3.org/Provider/Style/URI

;)

[+] benjaminjosephw|10 years ago|reply

A subtle way of airing my own views on the matter... nothing to do with blindly copy-pasting. ;)

[+] oneeyedpigeon|10 years ago|reply

What's weird is that the URL gives a 'mixed-content' warning in Chrome, supposedly for the logo. But in the markup, that image is reference by a relative URL; I can't figure out why Chrome is trying to load that image via HTTP...

[+] seagreen|10 years ago|reply

This is a make-work trap for conscientious people.

If it's more efficient for your business/project to change your URIs when going through a website design, go ahead (with the knowledge that you'll lose some traffic, etc.)

Seriously, there's no reason to feel guilty over this. It's not your fault, it's the fault of a system that built two UIs into every website (the website's HTML and the URL bar -- the second of which is supposed to be useful for browsing and navigation just like the first).

If W3C actually cared about links continuing to work, they would fix it at the technical level by promoting content-addressable links instead of trying to fix it at the social level (which will never work anyway, the diligent people that care about these things will always be just a drop in the bucket).

[+] oneeyedpigeon|10 years ago|reply

I work for a publisher that produces several articles a day that include external links for reference. More and more I seem to be coming across cases where those links are now broken. In the near future, I will startup an automated script that checks for broken links (and I'm guessing I may have to get it warning on redirects since certain bad actors use redirects when they should be using 410).

When I have some decent results, I'll be ensuring the editorial team is aware of which sites in particular are prone to breaking links, and which they can trust. The net effect will be that we will be less likely to drive traffic to certain domains. Whether enough other people will do this to make any kind of meaningful difference is unknown, but it's certainly better to be a trustworthy site that it can't hurt to link to.

On a related note, I learnt this week that taken-down YouTube videos are a PITA. Not only do they give a 200 when requested, they also give zero results when looked-up via the youtube api. Sure, they can still be treated as a 'broken link' from our end, but it would be nice to be able to differentiate between a video that was taken down and one that may never have existed in the first place.

[+] zAy0LfpBZLC8mAC|10 years ago|reply

> If W3C actually cared about links continuing to work, they would fix it at the technical level by promoting content-addressable links [...]

Now, while I very much like the idea of content-addressable systems, that is not the solution to this problem. Addresses/names often are used to identify more abstract things than "this sequence of bytes". For example, company A's current list of prices is not a fixed sequence of bytes, but rather an abstract concept that refers to information that varies over time. The purpose of a name in this case is that it allows you to obtain an up-to-date version of some information.

A name that is derived from the content that you want to obtain cannot possibly do that job. Only names maintained by people who understand the continuity of those varying byte sequences can do that.

[+] nicky0|10 years ago|reply

Well exactly. You can change your URLs if you want, but it's not cool. You can choose not to be cool.

[+] _greim_|10 years ago|reply

Like a lot of things a decade and a half ago, the W3C was pushing a good idea to the point of it being an unrealistic ideal, in an environment where most people were doing the exact wrong/opposite thing.

Now the situation is different, and a lot of these messages have finally sunk into the mainstream. I think you're right; there's no need for an army of purists to keep driving these points home. We get it, we know where the trade-offs are. This needs to be interpreted in a larger context of what was going on at the time.

[+] alistairjcbrown|10 years ago|reply

I remember Jeremy Keith talking about this at dConstruct conference; he put a bet on Long Bets that the URI of the bet wouldn't change [0]

[0] - http://longbets.org/601/

[+] corford|10 years ago|reply

A bit meta but can someone tell me if Warren Buffet is on course to win his bet[0]? It's set to expire next year.

[0] http://longbets.org/362/

[+] sspiff|10 years ago|reply

I wonder what happens to long bets when the charity of the winner no longer exists by the time a bet ends.

[+] sjwright|10 years ago|reply

The most likely change would be for protocol to move from http to https (and for non-secure URIs to be 301 redirected/forced by HSTS) though I don't think that qualifies as a change under their rules.

[+] paleite|10 years ago|reply

He's actually already won. The original URL included www., the current one doesn't anymore.

[+] wtbob|10 years ago|reply

On a related note, URIs shouldn't end in extensions (use content negotiation!), content should be readable without executing code (no JavaScript necessary), content should be available in multiple languages (use content negotiation!), and RESTful interface should offer a simple forms-based interface for testing, &c.

[+] Tomte|10 years ago|reply

I've settled for "every article has an URL that looks like a folder" (including trailing slash – that particular debate is pointless), only resources like pictures look like a file with extensions.

That's easy to achieve with all CMSen, but also trivially done with a static website (the web server is configured to use index.html or whatever you like).

[+] throwanem|10 years ago|reply

> RESTful interface should offer a simple forms-based interface for testing

There are plenty of tools which can serve this need. Why complicate every implementation with duplicative functionality?

[+] MichaelGG|10 years ago|reply

Content negotiation for language and format? Not in any human-usable system, please. Sharing a URL then getting content in a different language is incredibly annoying. There may be no alternative for home pages, and using the Accept-Lang header is nicer than using the broken geo system for sure.

But it makes no sense to have a link to a document and get completely different content back based on language. And file extensions are needed since browsers don't expose a way to ask for, say, PDFs or images over HTML.

[+] hwh|10 years ago|reply

On the other hand, "extension" is just an interpretation.

[+] _puk|10 years ago|reply

Unfortunately not all CDNs respect the Vary:Accept header.

I'm sure it's behind the proliferation of the .format extension to determine response type.

[+] SilasX|10 years ago|reply

So, a spec that doesn't describe the current state of the web at all and which hosts have found unworkable and over-constraining.

[+] zeveb|10 years ago|reply

I was recently digging through my old blog's archives, and it was appalling how many URLs from the early 2000s have completely disappeared, despite the fact that the sites which served them remain — and gratifying when I was able to reload some fringe resource from 1998 or 2003.

The Web is about webs of durable readable content, not about ephemeral walled-garden apps.

[+] _puk|10 years ago|reply

Unfortunately GitHub seems to be pretty guilty of this too.

Lost count of the number of times I've clicked a link in a blog post only to end up with a GitHub 404; I'm sure it is only going to get worse.

Most annoying thing with that one is usually simply going to GitHub search and putting in the original 404'd repo name turns up the one I want.

I'm sure having the 404 serve up a "did you mean" search result would solve the issue for the most part.

[+] ljk|10 years ago|reply

> The Web is about webs of durable readable content, not about ephemeral walled-garden apps.

Facebook is the worst at this. They even require you to have an account and log in to read a public-facing community/restaurant fb page

[+] marssaxman|10 years ago|reply

I would say "was supposed to be", not "is", but would otherwise agree.

[+] hyperpape|10 years ago|reply

One downside of this: I now feel like I can't create a proper place to keep my writing or other ideas until I carefully think of a URL scheme that I can maintain for eternity.

[+] hwh|10 years ago|reply

just use random numbers.

This is meant more seriously than it might sound. But in the field of "persistent identifiers", there's a notion that languages changes over time (a common example being the word "gay"), so introducing meaning into identification schemes might not be a good idea.

[+] tux1968|10 years ago|reply

You can always post it again under a new URI and redirect the original to the new more appropriate spot.

[+] TeMPOraL|10 years ago|reply

I, on the other hand, am looking forward to carrying this responsibility! I'm about to move my blog from Wordpress to a custom solution, and besides importing all the content, I plan to set up a URL router that matches all old posts' URLs and redirect them to the same posts in the new scheme. :).

[+] brightshiny|10 years ago|reply

There's RFC 4151 to keep in mind, TagURI, a scheme for URIs that is independent of URL (but can use it too). One reason to use it would be to mark a page as being the same resource even though the domain or URL had changed.

http://www.taguri.org/

https://en.wikipedia.org/wiki/Tag_URI_scheme

I wrote a library for it in Ruby which is how I know about it.

[+] masonlee|10 years ago|reply

This is a cool scheme to address the problem of URL-impermanance due to the fact that DNS name ownership/control can change over time. It allows you to specify a URL plus a date.

I wrote a short blog post about the problem a few years back, and the Tag URI scheme ended up being one of the best solutions I came across, which is how I know about it. Some links in that post and comments may be of interest to people: https://masonlee.org/2009/08/21/is-the-web-sticky-enough/

[+] tremon|10 years ago|reply

I wonder if the footnote was also written in 1998:

Historical note: At the end of the 20th century when this was written, "cool" was an epithet of approval particularly among young, indicating trendiness, quality, or appropriateness.

[+] lstamour|10 years ago|reply

Nope. https://web.archive.org/web/19990508205057/http://www.w3.org...

It was added sometime between Nov. 27, 2001 and Dec. 14, 2001: https://web.archive.org/web/20011214140114/http://www.w3.org...

For those wondering like I was (sigh), the typos that appear in the Hall of Flame footnotes when added in early 2000 were mostly fixed sometime in 2001, the last typo "uit" fixed in 2004.

[+] pimlottc|10 years ago|reply

That's somewhat ironic given how durable the word "cool" has been, despite the general trend of slang to go out of style fairly quickly (c.f. def, bully, radical, bodacious, groovy, boss, all that, etc...).

[+] thudson|10 years ago|reply

Cool URLs still work in OmniWeb on a NeXT with a decades old bookmarks file: https://www.flickr.com/photos/osr/17082625625/lightbox

The Mondo 2000 interview bookmarks, not so much.

[+] alpb|10 years ago|reply

The example URL they list from w3.org website (http://www.w3.org/1998/12/01/chairs) is now broken. Great deal of irony.

[+] kerrsclyde|10 years ago|reply

What about case sensitivity? https://www.w3.org/provider/style/uri doesn't work.

[+] prsutherland|10 years ago|reply

Pre-DMCA. If this gets an update, TBL should add legal reasons and law enforcement as a reason for URI change.

[+] nommm-nommm|10 years ago|reply

Shouldn't there be some tradeoffs?

Say I sign up with service x with the name y. My URL is www.x.com/users/y. Years later I delete my account. Someone else signs up with the name y. Now www.x.com/users/y goes to someone else's resources. The old URL is broken.

The only way to prevent this is either give the user a url (that they will want to share) that is not meaningful or disallow anybody to sign up with a name that was ever in use and names are a resource that is very limited.

Neither seems ideal. I do agree on principal that URLS shouldn't change, though.

Hotmail actually has this problem, or at least they used to. They delete accounts that are inactive for a long time and someone else can sign up with that name. The new person can get email addressed to the previous owner.

[+] kazinator|10 years ago|reply

"Except insolvency, nothing prevents the domain name owner from keeping the name."

Ah, the halcyon days of the Internet's adolescence.

This was before fiascos like Mike Rowe, of Canada, having his mikerowsoft.com taken away.

[+] nathancahill|10 years ago|reply

At least one of their example URLs on the page still points (eventually) to the same content: http://www.nsf.gov/cgi-bin/getpub?nsf9814

Although they seem to have not learned anything, and are now using .jsp instead of .pl.

[+] miseg|10 years ago|reply

It's tough!

I have a custom PHP app that includes marketing pages.

I'd like to crowbar Wordpress into the server to serve the marketing pages instead, to make it easie to change text over time.

A .htaccess set of redirect rules may indeed work, but it's hard work to keep all URLs working.

122 comments