top | item 32622966

(no title)

cmroanirgo | 3 years ago

Although I agree heartily with the idea of a push model for search engines, I can't help but notice that it seems to provide more centralisation to the search engines out there.

Here on HN we've been seeing posts of alternate search engines. How will those small bespoke engines make use of IndexNow unless the website participates?

The way I see IndexNow, I'll still get crawled relentlessly by the bots I don't want crawling my site (because robots.txt never seems to apply to them unless there's a special listing explicitly for them)

So, unless you're a participating search engine, a website will still be getting crawled by low hanging fruit, not alleviating the problem.

A good compromise would be something like an RSS feed, which a site can publish, and crawlers can hit for updated changes. It would also allow easier management for those domains that have many moving parts: individual search engines can be pinged, but the search engine just grabs the changes.xml file... Or something.

discuss

order

rstupek|3 years ago

It looks like a search engine could get listed here: https://www.indexnow.org/searchengines.json and any website which implements IndexNow could utilize that list to know where to publish?

There already is such an "RSS" feed, its called a sitemap available at /sitemap.xml or you can alternatively list your url in the robots.txt file

firecall|3 years ago

The issue with that approach is the same one that destroyed the trust in meta keywords!

The lack of trust means a search engine needs to know if what it's being presented in metadata is actually what's being served to the browser!

That's why we can't have nice things! :-)