top | item 2201104

Broken Links

303 points| martey | 15 years ago |tbray.org | reply

90 comments

order
[+] zemaj|15 years ago|reply
Despite all the FUD around hashbangs, the genuine problem I see with them is that they optimise for internal page loads, not the entry into a website. For example with hashbangs, requests to twitter when logged in go like;

1) HTTP GET http://twitter.com/some_account [~500ms for me]

2) 302 redirect -> HTTP GET http://twitter.com/ [~600ms for me]

3) HTML tells browser to download some JS -> HTTP GET bundle.js [~500ms for me] (concurrently here we start getting CSS)

4) JS reads hashbang & request actual data -> HTTP GET data.json [~500ms for me]

... only after about 2 seconds can we start(!) rendering data. Now there's about another 2 seconds for all json data & CSS calls to complete. It takes upwards of 4 seconds for a twitter page to render for me (the Load event lies as it fires well before actual data shows. Try it yourself with your favourite browser inspector).

When not using hashbangs, a single HTTP request can get all the data for the page and start rendering it. One blocking CSS call (possibly cached) is all that's needed for styling.

Hence when I see an external link with a hashbang it frustrates me (barely perceptively) because I know that when I load the page it's going to take a longer than a normal HTTP request. Significantly longer. While subsequent page loads are faster, it's not these you want to optimise for if you care about bounce rates. This issue affects every new link you click into a website, so it affects an even larger number of requests than normal bounces.

Hashbangs are a good solution to an important problem, but I don't see them as a tool to build entire websites upon. Fortunately I see the performance issue as one which will result in people voting with their browsers and choosing sites which only use hashbangs when they genuinely improve the user experience - especially since they're easily visible in the url.

[+] wmf|15 years ago|reply
Basically, NewTwitter isn't a Web site, it's an app and you have to "launch" it before you can do anything.
[+] romaniv|15 years ago|reply
Hashbangs are a workaround. A good _solution_ would be something that doesn't require running JavaScript and doesn't mess with URL/document models most of the Web is based on.

For example, browsers could implement partial caching. Here is how it could work. The first time the browser requests a page, it gets all the content in the response. However, some fragments of the content are identified as cacheable and marked with unique ids. When a browser requests a page for the second time, it sends a list of identifiers for the cached fragments to the server. The server then doesn't render those fragments, but places small placeholders/identifiers where they should be substituted into page content.

---

First Request

GET index.html

---

First Response

[cacheable id="abc"] [h1]This is twitter[/h1] bla bla bla, header content [/cacheable] ... Page content ... [cacheable id="xyz"] footer content [/cacheable]

---

Second Request

GET index.html Cached: abc, xyz

---

Second Response

[fragment id="abc" /] ... Page content ... [fragment id="xyz" /]

[+] pilif|15 years ago|reply
With pushState not widely implemented, you have three choices:

1) don't use AJAX in response to actions that alter the page content in a significant way. This of course forces page reloads and prevents the cool emerging pattern that is to not serve dynamic HTML but just have a REST API and do the rendering client side.

2) you do the ajaxy stuff but you don't touch the URL. This leads to a nonworking back button and prevents users from bookmarking or sharing links to specific views. You can work around this google maps style with some explicit "link to this page" functionality, but I would guess, people just don't get that.

3) you do the fragment change thing which allows for ajaxy page content changes but also makes the back button work and keeps links inherently shareable and bookmarkable at the cost of that one redirect, at the cost of screen-scrapability and maybe confusing to techies (normal people probably don't care either way)

pushState can look like a salvation, but keep one thing in mind: to keep the page working for browsers without JS (and screen scrapers), you will have to do your work twice and STILL render dynamic content on the server which is something people are now beginning to try to avoid.

Finally, as pushState is yet another not widely deployed thing, for the next five to ten years, you would have to do all of this three times: dynamic HTML generation for the purists. pushState for the new browsers and fragment change for IE.

Personally, I really feel that fragment change is a good compromise as it works with browsers and even in IE while still allowing the nice pattern of not rendering anything on the server and keeping the URLs shareable.

Maybe this current uproar is based on a) techies not used to this (normal people don't notice) and b) badly broken JS that sometimes prevents views from rendering AT ALL, but this is not caused by an inherent problem with the technology: if I screw up the server side rendering the page will be as empty as it is if I screw up on the client side.

[+] andrewgodwin|15 years ago|reply
The main problem with the fragment change solution is that it _doesn't work without JavaScript_. And we're not talking for the one user browsing the site - any links people post (on forums, mailing lists, etc.) that have fragments in them are simply unusable for people without JavaScript, as the server does not get sent the fragment - the best it can do is send a generic "oh, sorry, no JS" page back.

This would be a problem for search engines as well, if it wasn't for the awful translation Google said they'd do. It's just breaking the meaning of fragment identifiers completely, and that really makes me worried.

[+] othermaciej|15 years ago|reply
pushState with non-hash URLs doesn't require you to do server-side HTML generation. You can just send a stub page which looks at the URL and loads the right data, just as with hash URLs. To deploy it incrementally, you only really need one code path with a slight fork depending on whether the current URL contains a #! and whether the current browser supports pushState.
[+] othermaciej|15 years ago|reply
HTML5 "AJAX History", also known as History.pushState, can solve this problem. It allows a website to update its contents with AJAX, but change the URL to a real URL that will actually retrieve the proper resource direct from the server, while maintaining proper back-forward navigation.

See <http://dev.w3.org/html5/spec/Overview.html#dom-history-pushs...; for spec details.

It's in Safari, Chrome and Firefox. While Opera and IE don't have it yet, it would be easy to use conditionally on browsers that support it. I'm a little surprised that more sites don't use it.

EDIT: This chart shows what browsers it will work in: http://caniuse.com/history

[+] thomas11|15 years ago|reply
It's really great that in a few years, browsers will support a new AJAX technology that solves this problem that we wouldn't even have with sane, traditional URL schemes.
[+] davnola|15 years ago|reply
Is it ready for the mainstream?

Apart from not being supported in IE - the browsers that do support it still have quirks e.g. your code has to manually track scroll state.

[+] mjs|15 years ago|reply
This solves two problems: (1) the only visible/bookmarkable URLs are those without a #!; and (2) initial page loads can be fulfilled by a single request to the server. It doesn't solve the problem of URL discovery, but two out of three ain't bad.
[+] JoachimSchipper|15 years ago|reply
Well, it "solves" it - you still have to download and parse a ton of Javascript before you even begin downloading the data...
[+] bruceboughton|15 years ago|reply
Isn't the underlying problem that web applications are often displaying combinations of content that doesn't have a natural URL?

Take New Twitter, for example. If I click on a tweet in my stream, it shows related tweets. If a drill down a few of those, at some point it becomes impossible to represent the address of the current state in a sane manner.

I think URLs are particular to the web (desktop apps don't have them) because the web is traditionally about content. Web applications are increasingly breaking that. Perhaps web applications and URLs don't go together all that well.

Don't get me wrong--I love URLs, and it's crazy for content sites like Lifehacker to break them for so little benefit. But maybe the reason for this hashbang trend is that URLs aren't expressive enough for some of these sites.

[+] prodigal_erik|15 years ago|reply
In that case "web application" is a misnomer. If the current state has no natural URL, it's not a legitimate part of the World-Wide Web. Instead the authors are tunneling a proprietary protocol over AJAX to carry opaque content to a single-purpose GUI app, just like all the terrible client/server apps from the 90s only slower.
[+] Isofarro|15 years ago|reply
Ran into another interesting shortcoming of hash-bang URLs last night looking through my referrer log. Loads of referring URLs of http://gawker.com/ and http://kotaku.com/ to my blogpost. But no mention at all of my blog-post or a link to it on the homepage.

First I thought they were referrer log spamming, then it dawned on me that fragment identifiers get stripped out of HTTP referers, so making hash-bangs useless as a means of joining up distributed conversations on the web.

Somewhere on those two Gawker media sites there's a conversation going on about the use of hash-bangs. But nobody outside knows about it. It's a big black hole.

[+] Bockit|15 years ago|reply
Can't it work both ways? Serve the #! links and provide canonical content located at the (almost) same uri sans #!.

If you visit http://mysicksite.com/article/1 javascript changes all the links to the #! format. Then when the user clicks the links they enter #! land.

Now the user copies a link from their address bar and puts it into the wild. Someone gets that link, http://mysicksite.com/#!/article/1, and visits it. Rewrite with htaccess or whatever method you employ to serve the content at http://mysicksite.com/article/1, using javascript to change all the links to the #! format.

I posted this in the reddit thread about the Gawker/lifehacker problems recently, but was too late for anyone to really give me a response. For those of you that have worked with these kind of systems before, would this solve the problem the original link was describing?

EDIT: Ahh I think I get the problem now, of course after I post it. Server doesn't get the data from the uri trailing the #! I think?

[+] s0urceror|15 years ago|reply
That is, indeed, the crux of the problem. Anything after the hash is client-only.
[+] jvdongen|15 years ago|reply
[EDIT: never mind, missed this response, similar in style but 2h earlier ... http://news.ycombinator.com/item?id=2197064]

May be I'm missing something, but it seems to me that there is a way to have your cake and eat it too in this case.

Say we have a site with a page /contacts/ which lists various contacts.

On this page there are completely normal links like '/contacts/john/', each link preceded by/wrapped by an anchor tag - <a href="john"> in this case.

If you visit this site without javascript enabled (e.g. you happen to be a web crawler), you just follow the links and you get just regular pages as always.

If however you've javascript enabled, onclick events on each url intercept a click on a link and fetch just the information about the contact you clicked on (using an alternate url, for example /contacts/john.json), cancels the default action and (re)renders the page.

Then it does one of two things: - if pushState is supported it just updates the url - if pushState is not supported it adds '#john' to the url

If someone visits '/contacts/#john' with javascript enabled, /contacts/ is retrieved and then john's data is loaded and displayed.

If someone visits '/contacts/#john' without javascript enabled, he gets the full contact list, with the focus on the link to john's page, which he can then click.

By using this scheme: - search engine and other non-javascript users can fully use the site and see completely normal urls - XHR page loads are supported - XHR loaded pages don't break the backbutton - XHR loaded pages are bookmarkable - Bookmarks to XHR loaded pages are fully shareable if the recipient has javascript enabled or pushState is supported, and at least not totally broken if not.

The only drawback I can see is the 'sharing bookmarks with someone who has no javascript support' issue - is that a real biggie? In addition of course to the 'made error in javascript, now all stops working' issue - but that is something that has not so much do with the #! debate as well as with the 'is loading primary content via XHR a good idea' debate.

To me it seems that current users of the #! technique have just gone overboard a bit by relying only on the #! technique instead of combining it in a progressively enhancing way with regular HTTP requests.

[+] aamar|15 years ago|reply
The problem in this situation is that you have a smart technical person arguing for technical purity, while at the same time (seemingly) ignoring the mostly non-technical considerations of user-experience and economics.

Yes, the old, conservative model of HTML is very simple, but when people use AJAX well, the user experience is enormously and materially improved. We're still early in the development of this medium, and many people will do it wrong. But even the people who do it right will probably seem inelegant and kludgey by the standards of the old model.

And yes, you can get both AJAX and clean URLs via (still poorly-supported) HTML5 History API and/or other progressive enhancement methods, but these may require a significant amount of additional effort. Maybe worth it, maybe not.

This topic reminds me of when sound was added to movies. "Tight coupling" and "hideous kludge" sound a lot like the arguments that were made against that too. The conventional wisdom was to make your talkie such that the story worked even without sound; one can still sometimes hear that, but it isn't, I think, a standard that we associate with the best movies being made today.

[+] nostrademons|15 years ago|reply
It's not really that bad. The people using hash-bangs are following a spec proposed by Google to make AJAX webpages crawlable:

http://code.google.com/web/ajaxcrawling/docs/specification.h...

So when you see the lifehacker URL in the article, you know that there's an equivalent non-AJAX URL available with the same content at:

http://lifehacker.com/?_escaped_fragment_=5753509/hello-worl...

There's no need to execute all the JavaScript that comes back from the server - if they're following the spec, all you have to do is escape the fragment and toss it over to a CGI arg.

Another option is progressive enhancement, where you make every link point to a valid page and then add onclick event handlers that override the click event to do whatever JavaScript you want it to. I think this is a far superior option in general, but it has various issues in latency and coding complexity, so a good portion of web developers didn't do it anyway.

[+] brown9-2|15 years ago|reply
But as Tim says, the spec proposed by Google is only meant to fix some problems (can't be searched by search engines) caused by using this URL scheme. It isn't meant to be a one-guide-fits-all approach making AJAX content addressable.

In other words the spec treats one of the symptoms, not the original problem.

[+] vanessafox|15 years ago|reply
I posted more as a comment on the original story, but I have covered this issue in depth (from when Google initially proposed it, to when it was launched) here:

http://searchengineland.com/google-proposes-to-make-ajax-cra...

http://searchengineland.com/googles-proposal-for-crawling-aj...

http://searchengineland.com/its-official-googles-proposal-fo...

Of course, a better solution is some type of progressive enhancement that ensures both that search engines can crawl the URLs and anyone using device without JavaScript support can view all of the content and navigate the site.

[+] rushabh|15 years ago|reply
I can't understand how hard would it be for someone writing a crawler to replace a hashbang (#!) with _escaped_fragment_

For developers of AJAX apps it: 1. Improves productivity 2. Improves user experience 3. Is more efficient on the server as it prevents a lot of initializing code.

I think the old school needs to wake-up a bit!

[+] alexkearns|15 years ago|reply
Yet another annoying pontificating article about hashbangs. Why can't people accept that there are more than one way of doing things on the web.

Just because you don't like using hashbangs does not mean no-one else can.

Sure, use of hashbangs might make seo of your site harder. Yes, it might make it harder for hackers who want to do curls of your site's pages. But maybe this is not your aim with your site.

Maybe you want to give your users a slicker experience by not loading whole new pages but instead grabs bits of new content.

The web is a place for experimentation and we as hackers should encourage such experimentation, rather than condemning it because it does not fit with how we think things should be done.

[+] andolanra|15 years ago|reply
A while back, there was this pie-in-the-sky idea which was really interesting but not too practical, called Semantic Web. It didn't really pan out because it turns out that annotating your sites with metadata is boring and tedious and nobody really liked to do it, and anyway, search and Bayesian statistics simulated the big ideas of Semantic Web well enough for most people.

The ideas behind it still stand, though, in the idea of microformats. These are just standardized ways of using existing HTML to structure particular kinds of data, so any program (browser plug-in, web crawler, &c) can scrape through my data and parse it as metadata, more precisely and with greater semantic content than raw text search, but without the tedium that comes with ontologies and RDF.

Now, these ideas are about the structured exchange of information between arbitrary nodes on the internet. If every recipe site used the hRecipe microformat, for example, I could write a recipe search engine which automatically parses the given recipes and supply them in various formats (recipe card, full-page instructions, &c) because I have a recipe schema and arbitrary recipes I've never seen before on sites my crawler just found conform to this. I could write a local client that does the same thing, or a web app which consolidates the recipes from other sites into my own personal recipe book. It turns the internet into much more of a net, and makes pulling together this information in new and interesting ways tenable. In its grandest incarnation, using the whole internet would be like using Wolfram Alpha.

The #! has precisely the opposite effect. If you offer #! urls and nothing else, then you are making your site harder to process except by human beings sitting at full-stack, JS-enabled, HTML5-ready web browsers; you are actively hindering any other kind of data exchange. Using #!-only is a valid choice, I'm not saying it's always the wrong one—web apps definitely benefit from #! much more than they do from awkward backwards compatibility. But using #! without graceful degradation of your pages turns the internet from interconnected-realms-of-information to what amounts to a distribution channel for your webapps. It actively hinders communication between anybody but the server and the client, and closes off lots of ideas about what the internet could be, and those ideas are not just "SEO is harder and people can't use curl anymore."

I don't want to condemn experimentation, either, and I'm as excited as anyone to see what JS can do when it's really unleashed. But framing this debate as an argument between crotchety graybeards and The Daring Future Of The Internet misses a lot of the subtleties involved.

[+] tlack|15 years ago|reply
You can still avoid loading whole new pages. You simply attach Javascript events to your anchor tags and do whatever Ajax content trickery you want that way. The page content itself is maximally flexible and useful to all agents if the URLs inside of it are actual URLs.
[+] rix0r|15 years ago|reply
True, but the same non-reload could be accomplished with:

  <a href="realurl.html" onclick="javascript_magic(); return false">
And wouldn't break spiders.
[+] joelanman|15 years ago|reply
totally agree - if using hashbangs provides the best experience for your context, why not use it?
[+] garrettgillas|15 years ago|reply
The point of mainstream sites indicating that the page has ajax with the URL path is to tell search engines. I have a feeling that what the author doesn't get is that it is very hard for search engines to tell the difference between ajax pages, static pages, and spammy keyword stuffed pages.

To me, it seems that Google recommends indicating ajax content in the path in the same way that our government issues concealed weapon permits. Yes it okay to have concealed content that can loads on the fly as long as you are very clear of your intentions. Once again this is a usability issue that wouldn't be an issue if it weren't for spammers.

[+] zachbeane|15 years ago|reply
This rant would be more effective and persuasive if also directed at the Google engineers who made this hashbang style pervasive in Google Groups. I didn't think it would be possible to get deep links to old articles even worse than before, but they managed it.
[+] il|15 years ago|reply
It's interesting how many upvotes this is getting in a very short time. However, I don't think the average Twitter user cares about performance and URL elegance, so I doubt Twitter will change anything.
[+] jamesjyu|15 years ago|reply
I have seen performance issues and outright broken behavior with Twitter's hashbang ajax loading scheme. In that respect, regular users will care (they just won't necessarily know what is causing the issue).
[+] macrael|15 years ago|reply
Probably not, but that doesn't mean people who do care shouldn't discuss the implications, or that Twitter shouldn't think there is a problem.
[+] guelo|15 years ago|reply
Considering that twitter is the main reason for the spread of the abomination that is URL shorteners you're probably right. They don't seem to care about the health of the web.
[+] zaius|15 years ago|reply
I think people are missing a huge benefit of the hashbang syntax: readable and copy/paste-able URLs. Without them, it's impossible to have an ajax application with a decent URL scheme.
[+] masklinn|15 years ago|reply
You don't need the hashbang for that. You never did. Hashbang only tells google "munge around this shit to get an actual page".
[+] jamesjyu|15 years ago|reply
I think the real question here is whether the application should be loading the main content via AJAX in the first place. Tim argues here that it should not, for this use case.
[+] jcfrei|15 years ago|reply
Just a thought - but could a lot of people complaining about hashbangs still be browsing the web with lynx?
[+] dtby|15 years ago|reply
Hi, HTML/HTTP are the second worse application delivery platform available. Try not to be shocked.

Sorry, your other choice was #1.