This is a great approach, but detecting the user-agent is the wrong way to decide if you should pre-render the page. If you include the following meta tag in the header:
<meta content="!" name="fragment">
then Google will request the page with the "_escaped_fragment_" query param. That's when you should serve the pre-rendered version of the page.
Waiting for google to request the page with _escaped_fragment_ should also prevent you from getting penalized for slow load times or showing googlebot different content.
No, it is not. While this will certainly help your client side app get indexed, it is not 'great'. Other commenters on this thread bring up a number of valid concerns, but in my mind it comes down to two very simply things.
One is that when you are fighting for the top spot in organic traffic, this won't cut it. Off-page SEO is more important than on-page optimizations, but it on-page optimizations still have value.
The other issue is that this approach assumes that the client side rendered view at a particular hash is exactly what should be initially rendered on the server side. While this could work in some cases, it is my experience that it either creates a weird user experience and/or you end up doing hacks on the client side in order to ensure PhantomJS captures the right html.
This is a fine solution for some use cases, but I really hope that the community doesn't think this is the future. This is a temporary hack until we get a good server/client rendering framework in place OR all search engines evolve to capture pure client side apps without any of this.
This is a great point. It might seem extreme, but I would advocate never using the User-Agent string to make decisions about what to serve a client. There is too much hackery and history that clouds up the User-Agent (such as every browser identifying itself as Mozilla), and it's almost always a proxy for something else that you actually want to test for.
In some rare situations, it's unavoidable, but even then I'd urge trying to rearchitect the solution to avoid it.
Rendering different content based on user agent is tempting the webspam gods. Rendering nothing but a big gob of javascript to non-googlebot user agents is a recipe to get the banhammer dropped on your head.
You're either gambling that Google is smart enough to know that your particular big gob of javascript isn't cloaking keyword spam (in which case you should just depend on their JS evaluation, since you already are, implicitly), or you're gambling that they won't bust you even though your site looks like a classic keyword stuffer.
"JavaScript: Place the same content from the JavaScript in a <noscript> tag. If you use this method, ensure the contents are exactly the same as what’s contained in the JavaScript, and that this content is shown to visitors who do not have JavaScript enabled in their browser."
This was done for years with Flash sites and I never saw Google black list anyone doing it legitimately.
You can also provide different content if you want the content to be behind a pay wall, although personally I find this is a little annoying.
Google does have a section within their guidelines on creating "HTML Snapshots". "If a lot of your content is created in JavaScript, you may want to consider using a technology such as a headless browser to create an HTML snapshot." https://developers.google.com/webmasters/ajax-crawling/docs/...
Hiding keyword spam behind JS doesn't make any sense in this situation - the whole point is that the JS isn't being served to Google. That's who keyword spammers are trying to fool, not actual humans.
This will get you penalized for having a website that takes forever to load. This is what happens:
Googlebot requests page -> your webapp detects googlebot -> you call remote service and request that they crawl your website -> they request the page from you -> you return the regular page, with js that modifies it's look and feel -> the remote service returns the final html and css to your webapp -> your webapp returns the final html and css to Googlebot. That's gonna be just murder on your loadtimes.
If this must be done, for static pages, it should be done by grunt during build time, not by a remote service. For dynamic content, it's best to do the phantomjs rendering locally, and on an hourly (or so) schedule, since it doesn't really matter if googlebot has the latest version of your content.
Or perhaps I'm mistaken and the node-module actually calls the service hourly or so and caches results on app so it doesn't actually call the service during googlebot crawls. If that's the case, I take back my objections, but I'd recommend updating the website to say as much.
If it doesn't cache, then besides latency, someone could send fake googlebot requests and overload the prerender service, which is unlikely to be able to handle a lot of traffic.
An entire project written to simulate progressive enhancement (badly). One that only works for specified whitelisted User-Agents, instead of being based on capability.
I'm also not understanding the use-case for this project. Everytime the topic of "Web Apps", "JavaScript Apps", "Single page web apps" comes up, evangelists point out that they are applications (or skyscrapers), not just fancy decorators for website content.
So exactly what is this project delivering as fallback content? A server-generated website?
This project just seems pointlessly backwards. Simulating a feature that the JavaScript framework has already deliberately broken. One that introduces a server-side dependency on a project deliberately chosen not to have a server-side framework.
This just looks like a waste of effort, when building the JavaScript application properly the first time, with progressive enhancement, covers this exact use-case, and far, far more use-cases.
The time would have been better spent fixing these evidently broken JavaScript frameworks - Angular, ember, Backbone. Or at least to fix the tutorial documentation to explain how to build Web things properly. (This stuff isn't difficult, it just requires discipline)
I call hokum on people saying there's a difference between Websites and Web apps (or the plethora of terms used to obfuscate that: Single-page apps, JavaScript apps). This project proves that these are just Websites, built improperly, and this is the fudge that tries to repair that for Googlebot.
Why some developers are so against progressive enhancement mystifies me. It is an elegant solution that actually works in all cases rather than an ugly hack that should probably work in the majority of cases. How can there even be a dispute about it? It's insane!
What would you do if you required SEO enhancement AND dynamic loading of content? Are you supposed to just let that portion of the site go without indexing? Surely there are sites that have both requirements.
If you are able to "pre-render" a JavaScript app like this, then you should be serving users the pre-rendered version and then enhancing it with JavaScript after onload.
JavaScript-only apps are a blight on the web. All it takes is a bad SSL cert, or your CDN going down, and your pages become useless to the end-user.
Google doesn't like it when they are shown different content than a browsing user. This is roughly the equivalent of pointing Google Agent to a copy of the page requested that happens to be in Memcached instead of spinning up the full app stack to do the render.
>Static rendering of dynamic content? I don't think this does make sense.
Bro do you even Web 1.0? That's what CGI scripts in Perl did! Pull the data from the database, generate HTML (no JavaScript back then!) on the fly, and send to the browser.
I was under the impression that Googlebot already executes javascript on pages.
A more interesting idea would be if you do this for every user - prerender the page and send them the result, so they don't have to do the first, heavy js execution themselves. I know it sounds a bit retarded at first - you're basically using javascript as a server-side page renderer, but think about this: You can choose to prerender or not to prerender based on user agent string -- do it for people on mobile phones, but not for desktop users. You can write your entire site with just client-side page generation with javascript and let it run client-side at first, then switch to server-side prerendering once you have better hardware.
Something similar to that, albeit slightly more elegant, is the work that AirBnB has done with their rendr [0] project, which serves prerendered content that's then rerendered with JS if it needs to be changed. You can do similar things with non-Backbone stacks, of course.
It's a multipage app, that uses ajax to function as a singlepage app. From the user's point of view it's a singlepage app, but it's accessible from any of the URLs that it pushStates to, so it's like the best of both worlds. It's fully crawlable because it functions as a multipage app, but it's got the speed of a singlepage app (if your browser supports pushState)
I tried using phantomjs in the past to serverside render a complex backbone application for SEO, and it was taking over 15 seconds to return a response (which is bad for SEO).
Looking at the prerender's source I did't see any caching mechanism.
What kind of load times have you see rendering your apps?
Have there been recent significant improvements in phantomjs's performance?
You can get it faster than 15 seconds, but you can't really get it fast enough. We precache everything. I would strongly recommend against trying to process the pages in realtime.
I have been looking for something like this for a long time. Seems very straight forward.
I have not tested it yet, but I wonder if the speed of render will penalize you in the google results. Seems like a separate machine with a good CPU might be worthwhile if you are going to run this.
They execute Javascript in limited fashion. So you should consider using what is suggested by google itself https://developers.google.com/webmasters/ajax-crawling/docs/... . If you are using angular, then you will get your template displayed instead of fully rendered page. with all {{sitename}} displayed.
It's a SaaS which is much more elaborated than this project (there is one year of development into it). We serve and crawl thousands of pages every day without any issues.
If you're using Rails have a look at https://github.com/seojs/seojs-ruby, it's a gem similar to prerender but it's using our managed service at http://getseojs.com/ to get the snapshots. There are also ready to use integrations for Apache and Nginx.
Some benefits of SEO.js to other approaches are:
- it's effortless, you don't need to setup and operate your own phantomjs server
- snapshots are created and cached in advance so the search engine crawler won't be put off by slow page loads
I recently needed to do this for google, but i wanted the rendering time, and delivery of the page to be under 500MS, so i hacked up something that works with expressjs
It uses phantomjs but removes all the styles initially so the rendering time is much faster. (my ember app was averaging 70MS to render, but i prefetch the page data)
Looks like that's exactly what Meteor's spiderable package does since 08/2012[0]: look at user-agent, run phantomjs for 10s and return a rendered page once google/facebook crawler detected.
[+] [-] dwwoelfel|12 years ago|reply
Google has documentation on this here: https://developers.google.com/webmasters/ajax-crawling/docs/... and we've been using this method at https://circleci.com for the past year.
Waiting for google to request the page with _escaped_fragment_ should also prevent you from getting penalized for slow load times or showing googlebot different content.
[+] [-] Isofarro|12 years ago|reply
Nick Denton: "Dip in uniques largely because of drop in Google refers. Pageviews (which are driven more by core audience) less affected." -- http://twitter.com/nicknotned/status/61152134929981440
Nick Denton: "Google does not fully support "hashbang" URLs. So we're eliminating them rather than waiting for Mountain View." -- http://twitter.com/nicknotned/status/61465859079671808
Nick Denton: "Yeah, I'd advise against hashbang urls. Will kill search traffic -- even if you abide by Google protocol." -- http://twitter.com/nicknotned/status/62595141927583745
[+] [-] bceagle|12 years ago|reply
No, it is not. While this will certainly help your client side app get indexed, it is not 'great'. Other commenters on this thread bring up a number of valid concerns, but in my mind it comes down to two very simply things.
One is that when you are fighting for the top spot in organic traffic, this won't cut it. Off-page SEO is more important than on-page optimizations, but it on-page optimizations still have value.
The other issue is that this approach assumes that the client side rendered view at a particular hash is exactly what should be initially rendered on the server side. While this could work in some cases, it is my experience that it either creates a weird user experience and/or you end up doing hacks on the client side in order to ensure PhantomJS captures the right html.
This is a fine solution for some use cases, but I really hope that the community doesn't think this is the future. This is a temporary hack until we get a good server/client rendering framework in place OR all search engines evolve to capture pure client side apps without any of this.
[+] [-] benaiah|12 years ago|reply
In some rare situations, it's unavoidable, but even then I'd urge trying to rearchitect the solution to avoid it.
[+] [-] thoop|12 years ago|reply
[+] [-] thomasfromcdnjs|12 years ago|reply
http://github.com/apiengine/seoserver
and the blog post related to it
http://backbonetutorials.com/seo-for-single-page-apps/
[+] [-] gcb1|12 years ago|reply
the issue with getting content from scripted sites is not the initial part... you could use noscript and be done much easier.
the real issue is that most sites require user interaction to get to most content. this does nothing besides providing a convenient DoS entry point.
nice hack though.
[+] [-] timr|12 years ago|reply
Rendering different content based on user agent is tempting the webspam gods. Rendering nothing but a big gob of javascript to non-googlebot user agents is a recipe to get the banhammer dropped on your head.
You're either gambling that Google is smart enough to know that your particular big gob of javascript isn't cloaking keyword spam (in which case you should just depend on their JS evaluation, since you already are, implicitly), or you're gambling that they won't bust you even though your site looks like a classic keyword stuffer.
[+] [-] robmcm|12 years ago|reply
https://support.google.com/webmasters/answer/66353
"JavaScript: Place the same content from the JavaScript in a <noscript> tag. If you use this method, ensure the contents are exactly the same as what’s contained in the JavaScript, and that this content is shown to visitors who do not have JavaScript enabled in their browser."
This was done for years with Flash sites and I never saw Google black list anyone doing it legitimately.
You can also provide different content if you want the content to be behind a pay wall, although personally I find this is a little annoying.
[+] [-] stephenheron|12 years ago|reply
[+] [-] benaiah|12 years ago|reply
[+] [-] _lex|12 years ago|reply
Googlebot requests page -> your webapp detects googlebot -> you call remote service and request that they crawl your website -> they request the page from you -> you return the regular page, with js that modifies it's look and feel -> the remote service returns the final html and css to your webapp -> your webapp returns the final html and css to Googlebot. That's gonna be just murder on your loadtimes.
If this must be done, for static pages, it should be done by grunt during build time, not by a remote service. For dynamic content, it's best to do the phantomjs rendering locally, and on an hourly (or so) schedule, since it doesn't really matter if googlebot has the latest version of your content.
Or perhaps I'm mistaken and the node-module actually calls the service hourly or so and caches results on app so it doesn't actually call the service during googlebot crawls. If that's the case, I take back my objections, but I'd recommend updating the website to say as much.
[+] [-] benatkin|12 years ago|reply
[+] [-] 10098|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] ilaksh|12 years ago|reply
[+] [-] Isofarro|12 years ago|reply
I'm also not understanding the use-case for this project. Everytime the topic of "Web Apps", "JavaScript Apps", "Single page web apps" comes up, evangelists point out that they are applications (or skyscrapers), not just fancy decorators for website content.
So exactly what is this project delivering as fallback content? A server-generated website?
This project just seems pointlessly backwards. Simulating a feature that the JavaScript framework has already deliberately broken. One that introduces a server-side dependency on a project deliberately chosen not to have a server-side framework.
This just looks like a waste of effort, when building the JavaScript application properly the first time, with progressive enhancement, covers this exact use-case, and far, far more use-cases.
The time would have been better spent fixing these evidently broken JavaScript frameworks - Angular, ember, Backbone. Or at least to fix the tutorial documentation to explain how to build Web things properly. (This stuff isn't difficult, it just requires discipline)
I call hokum on people saying there's a difference between Websites and Web apps (or the plethora of terms used to obfuscate that: Single-page apps, JavaScript apps). This project proves that these are just Websites, built improperly, and this is the fudge that tries to repair that for Googlebot.
[+] [-] philbo|12 years ago|reply
Why some developers are so against progressive enhancement mystifies me. It is an elegant solution that actually works in all cases rather than an ugly hack that should probably work in the majority of cases. How can there even be a dispute about it? It's insane!
[+] [-] raynjamin|12 years ago|reply
What's the alternative?
[+] [-] wldlyinaccurate|12 years ago|reply
JavaScript-only apps are a blight on the web. All it takes is a bad SSL cert, or your CDN going down, and your pages become useless to the end-user.
[+] [-] dchest|12 years ago|reply
How are non-JavaScript pages protected from this?
[+] [-] ewillbefull|12 years ago|reply
[+] [-] michaelbuckbee|12 years ago|reply
[+] [-] eonil|12 years ago|reply
If it's pre-rednered, it's missing something. If it has all the data at first, then it's not dynamic.
Pre-rendered(static) javascript app(dynamic)...? Hmm... I don't see anything more than something like JWT in JS instead of Java?
[+] [-] FedRegister|12 years ago|reply
Bro do you even Web 1.0? That's what CGI scripts in Perl did! Pull the data from the database, generate HTML (no JavaScript back then!) on the fly, and send to the browser.
[+] [-] dchest|12 years ago|reply
Yes.
> I don't think this does make sense.
It does, if you use one of the JS frameworks listed on the linked page.
[+] [-] anonymous|12 years ago|reply
A more interesting idea would be if you do this for every user - prerender the page and send them the result, so they don't have to do the first, heavy js execution themselves. I know it sounds a bit retarded at first - you're basically using javascript as a server-side page renderer, but think about this: You can choose to prerender or not to prerender based on user agent string -- do it for people on mobile phones, but not for desktop users. You can write your entire site with just client-side page generation with javascript and let it run client-side at first, then switch to server-side prerendering once you have better hardware.
[+] [-] benaiah|12 years ago|reply
[0]: https://github.com/airbnb/rendr
[+] [-] pzxc|12 years ago|reply
https://news.ycombinator.com/item?id=6507135
It's a multipage app, that uses ajax to function as a singlepage app. From the user's point of view it's a singlepage app, but it's accessible from any of the URLs that it pushStates to, so it's like the best of both worlds. It's fully crawlable because it functions as a multipage app, but it's got the speed of a singlepage app (if your browser supports pushState)
[+] [-] bfirsh|12 years ago|reply
[+] [-] tjmehta|12 years ago|reply
Looking at the prerender's source I did't see any caching mechanism.
What kind of load times have you see rendering your apps?
Have there been recent significant improvements in phantomjs's performance?
[+] [-] chaddeshon|12 years ago|reply
You can get it faster than 15 seconds, but you can't really get it fast enough. We precache everything. I would strongly recommend against trying to process the pages in realtime.
[+] [-] ivanhoe|12 years ago|reply
[+] [-] beernutz|12 years ago|reply
I have not tested it yet, but I wonder if the speed of render will penalize you in the google results. Seems like a separate machine with a good CPU might be worthwhile if you are going to run this.
[+] [-] gkoberger|12 years ago|reply
Google is less important (they already execute JS), however it's good for sites like Facebook (which doesn't when you share a link).
[+] [-] mk3|12 years ago|reply
[+] [-] gildas|12 years ago|reply
It's a SaaS which is much more elaborated than this project (there is one year of development into it). We serve and crawl thousands of pages every day without any issues.
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] se_|12 years ago|reply
Some benefits of SEO.js to other approaches are:
- it's effortless, you don't need to setup and operate your own phantomjs server
- snapshots are created and cached in advance so the search engine crawler won't be put off by slow page loads
- snapshots are updated regularly
[+] [-] chadscira|12 years ago|reply
https://github.com/icodeforlove/node-express-renderer
It uses phantomjs but removes all the styles initially so the rendering time is much faster. (my ember app was averaging 70MS to render, but i prefetch the page data)
[+] [-] paulocal|12 years ago|reply
[+] [-] RoboTeddy|12 years ago|reply
http://docs.meteor.com/#spiderable
[+] [-] imslavko|12 years ago|reply
[0]: http://www.meteor.com/blog/2012/08/08/search-engine-optimiza...
[+] [-] davedx|12 years ago|reply
[+] [-] commanderj|12 years ago|reply
[+] [-] dchest|12 years ago|reply