top | item 38832257

Show HN: Page Replica – Tool for Web Scraping, Prerendering, and SEO Boost

135 points| nirvanist | 2 years ago |github.com | reply

54 comments

order
[+] colesantiago|2 years ago|reply
It seems in the AI era SEO is now starting to become irrelevant and a relic of the 2000-2020 era.

Why is SEO needed still needed here when AI / LLMs can just conjure up answers with references to valid links, bypassing search engines.

Even privacy based search engines like DuckDuckGo, Brave and Kagi doesn't prioritise 'SEO'.

[+] spinningslate|2 years ago|reply
>Why is SEO needed still needed here when AI / LLMs can just conjure up answers with references to valid links, bypassing search engines.

In short: money. LLMs will no doubt change the implementation, but the commercial dynamics are fundamentally the same. It's expensive to build and run a search engine, whether conventional or LLM-based. Someone has to pay for that - and it's not search users. Advertising and its derivatives have become that revenue source, with all the good and bad that brings with it. As long as that commercial dynamic remains, there'll be SEO or some derivative thereof.

--

Other than Kagi - but that's a tiny niche.

[+] quickthrower2|2 years ago|reply
Long term you may be right.

SEO isn’t dead but it will slowly die off. Being a reference link is second place but by that point you only get visited if the AI wasn’t trusted or didn’t solve the problem.

Therefore I think viral/word of month or links from other engaged sources will become more relevant.

Right now though why lose out on free SEO traffic just because you used JS to render most of your site?

[+] imiric|2 years ago|reply
How will AI tools know which sources are "valid"? It's likely SEO will transform into ways of tricking bots scraping training data into considering their information as being more "valid".

Alternative search engines must rely on AI themselves to filter out good results, or some form of manual curation by humans, like Kagi's boost/block/pin feature.

[+] 8organicbits|2 years ago|reply
Because AI doesn't provide accurate information and you need to validate it yourself? Has anyone who cared about SEO stopped recently?
[+] la_fayette|2 years ago|reply
I think we must distinguish between onpage and offpage seo. This proposal is only relevant for onpage seo, for which i would mostly agree with your comment. However, inbound links are and will be the most important signal for search. What else would be left for ranking?
[+] wongarsu|2 years ago|reply
I still use Google (and ddg and kagi), so people who want to sell me stuff try to get better rankings in these search machines. I'd also wager that people who primarily use LLMs to answer their questions are only a rounding error.
[+] paulcole|2 years ago|reply
In the AI era, which services provide up to date information about local queries — like which dentists near me are open today?
[+] rgrieselhuber|2 years ago|reply
SEO has been “dead” since the late 90s.
[+] lnxg33k1|2 years ago|reply
https://github.com/html5-ninja/page-replica/blob/main/api.js...

This code base has the most useful comments ever, are these normally accepted? Enforced? Adding stuff that has no value, but needs to be mantained and updated when code changes without the ability for it to be validated by compilers or parsers?

[+] nojs|2 years ago|reply
These comments look a lot like GPT to me
[+] ssgodderidge|2 years ago|reply
Curious, how would caching the pages and serving over NGINX help with SEO? Is there any benefits over serving a static site?
[+] nirvanist|2 years ago|reply
Yes, Google can render your content handled by JavaScript, but Googlebot allows only a few milliseconds for rendering. If your page isn't rendered within that time frame, it may be penalized.

For this reason, many news and broadcasting media outlets still use prerendering services. I speak from experience as I worked in a large Canadian media company.

Another important factor to consider is that the SEO world isn't limited to Google. Various bots, including those from other search engines and platforms like Facebook, require correctly rendered pages for optimal sharing and visibility.

Lastly, the choice between client-side rendering (CSR) and server-side rendering (SSR) depends on your specific needs. Google Search Console provides valuable metrics and information about your app, so it might be worth considering SSR if that better aligns with your requirements.

[+] jotto|2 years ago|reply
A "static site" implies HTML rather than a JavaScript app.

With respect to JavaScript apps (React, Angular, etc.):

It's not clear these days because the major search engines don't explicitly clarify whether they parse JavaScript apps (or if they only parse high-ranking JS apps/sites. But 10 years ago it was a must-have to be indexed.

One theory on pre-rendering is it reduces cost for the crawlers since they don't need to spend 1-3s of CPU time pre-rendering your site. And by reducing costs, it may increase chances of being indexed or higher rank.

My hunch is that long-term, pre-rendering is not necessary for getting indexed. But it is typically still necessary for URL unfurls (link previews) for various social media and chat apps.

disclosure: I operate https://headless-render-api.com

[+] nodja|2 years ago|reply
If your web app serves dynamic routes (i.e. client only) this helps with SEO because those routes are now directly visible through most crawlers.
[+] xnx|2 years ago|reply
What's the use case? Scrape someone else's dynamic site and serve it statically as your own?
[+] binarymax|2 years ago|reply
I read it as being able to dev your site in whatever you want, then scrape it and publish it as a static and seo optimized site.
[+] nirvanist|2 years ago|reply
for me is t not the purpose, but you can do it if you want.

the use case for me was that meteorjs app are pourly SEO friendly and I need it to have prerindering html to serve it for bots

[+] janjones|2 years ago|reply
I have done something similar when archiving a dynamic site, serving it as static snapshot for free.
[+] quickthrower2|2 years ago|reply
Cool! I thought about this idea as an IaaS! A CDN that can figure out your rendered page and serve it. Regardless of tech stack.
[+] Lukkaroinen|2 years ago|reply
I wonder where this is actually needed, since most React frameworks support metadata with server-side rendering.
[+] nirvanist|2 years ago|reply
thank your for the comment while it's true that not all web applications leverage the React library, it's important to note that Next.js inherently supports React. However, the choice of technology stack depends on the specific use case and requirements of your project.
[+] mrtksn|2 years ago|reply

[deleted]

[+] pmx|2 years ago|reply
> In places where JS is truly useful, that is when the UI is more than a text document, SEO is not a concern or possibility.

I don't think you're thinking about this. e-commerce is more than just a document and SEO is massively important there but javascript makes the user experience miles better; Think product variation selectors, bulk pricing selectors, product filtering, realtime cart, etc, etc. It's insane to say we shouldn't use new tech so that the search engines can index us, do we just forever stick with what we had 15 years ago and never progress? Madness.