(no title)
burnhamup | 2 years ago
Google's suggestion isn't to delete pages, but maybe mark some pages with a no index header.
https://developers.google.com/search/docs/crawling-indexing/...
burnhamup | 2 years ago
Google's suggestion isn't to delete pages, but maybe mark some pages with a no index header.
https://developers.google.com/search/docs/crawling-indexing/...
crazygringo|2 years ago
That's for stuff like large e-commerce sites with constantly changing product info.
Google is clear that if your content doesn't change often (in the way that news articles don't), then crawl budget is irrelevant.
snowwrestler|2 years ago
It’s easy to change millions of pages once a week with on-load CMS features like content recommendations. Visit an old article and look at the related articles, most read, read this next, etc widgets around the page. They’ll be showing current content, which changes frequently even if the old article text itself does not.
linkjuice4all|2 years ago
There’s some methodology to trying to direct Google crawls to certain sections of the site first - but typically Google already has a lot of your URLs indexed and it’s just refreshing from that list.
codedokode|2 years ago
throw0101a|2 years ago
Once a site has been indexed once, should it really be crawled again? Perhaps Google should search for RSS/Atom feeds on sites and poll those regularly for updates: that way they don't waste time doing to a site scrape multiple times.
Old(er) articles, once crawled, don't really have to be babysat. If Google wants to double-check that an already-crawled site hasn't changed too much, they can do a statistical sampling of random links on it using ETag / If-Modified-Since / whatever.
jrochkind1|2 years ago
No need to invent a new system based on RSS/Atom, there is already an actually existing and in-use system based on SiteMap.
So, what you suggest is already happening -- or at least, the system is already there for it to happen. It's possible Google does not trust the last modified info given by site owners enough, or for other reasons does not use your suggested approach, I can't say.
https://developers.google.com/search/docs/crawling-indexing/...
jszymborski|2 years ago
Just a guess though.
influx|2 years ago
0cf8612b2e1e|2 years ago
Alternatively, other than ads, what is changing on a CNN article from 10 years ago? Why would that still be getting daily scans?
progmetaldev|2 years ago
kenjackson|2 years ago
em-bee|2 years ago
i am tracking rss feeds of many sites, and on some i get notifications for old articles because something irrelevant in the page changed.
bhandziuk|2 years ago
tedunangst|2 years ago
pessimizer|2 years ago
sznio|2 years ago
lkbm|2 years ago
Wikipedia had a long tail of low-value content, but even the low-value content tends to be among the highest value for its given focus. e.g., I don't know how many people search "Danish trade monopoly in Iceland", and the Wikipedia article on it isn't fantastic, but it's a pretty good start[0]. Good enough to serve up as the main snippet on Google.
[0] https://en.wikipedia.org/wiki/Danish_trade_monopoly_in_Icela...
snowwrestler|2 years ago
They’re just truly useful pages, and that is reflected in how people interact with them.
lmm|2 years ago
skissane|2 years ago
unknown|2 years ago
[deleted]
ericd|2 years ago
jesprenj|2 years ago
codedokode|2 years ago
nevi-me|2 years ago