top | item 37068464

CNET is deleting old articles to try to improve its Google Search ranking

801 points| mikece | 2 years ago |theverge.com

574 comments

order
[+] pessimizer|2 years ago|reply
So google's shitty search now economically incentivizes sites to destroy information.

Can there be any doubt that Google destroyed the old internet by becoming a bad search engine? Could their exclusion of most of the web be considered punishment for being sites being so old and stable that they don't rely on Google for ad revenue?

[+] dep_b|2 years ago|reply
Did you notice that nowadays a lot of websites have a lot of uninteresting drivel giving a "background" to whatever the thing was you were searching for before you get to read (hopefully) the thing you were searching for?

People discovered that Google measures not only how much time you stay on a webpage but also how much you scroll to define how interesting a website is. So now every crappy "tech tips" website that has an answer that fits in a short paragraph now makes you scroll two pages before you get the thing you actually wanted to read.

[+] dannysullivan|2 years ago|reply
Hello from Google! I work for our search ranking team. Sadly, we can't control publishers who do things that we do not advise and do not recommend.

We have no guidance telling publishers to get rid of "old" content. That's not something we've said. I shared this week that it is not something we recommend: https://twitter.com/searchliaison/status/1689018769782476800

This also documents the many times over the years we've also pushed back on this myth: https://www.seroundtable.com/google-dont-delete-older-helpfu...

[+] demizer|2 years ago|reply
Google is destroying the internet is a good way to put it. AD dollars are their only priority. I hope Google dies because of it.
[+] hliyan|2 years ago|reply
Before someone suggests a new search engine where the ranking algorithm is replaced with AI, I would like to propose a return to human-curated directories. Yahoo had one, and for a while, so did Google. It was pre-social-media and pre-wiki, so none of these directories were optimized to take advantage of crowdsourcing. Perhaps it's time to try again?

https://en.wikipedia.org/wiki/Google_Directory

[+] kristopolous|2 years ago|reply
I feel like they presume I'm a gullible person they need to protect who is just on the Internet for shopping and watching entertainment.

Wasn't the point of them tracking us so much to customize and cater our results? Why have they normalized everything to some focus group persona of Joe six-pack?

***

Let's try an experiment

Type in "Chicago ticket" which has at least 4 interpretations, a ticket to travel to Chicago, a citation received in Chicago, A ticket to see the musical Chicago and a ticket to see the rock band Chicago.

For me I get the Rock band, citation, baseball and mass transit ticket in that order.

I'm in Los Angeles, have never been to Chicago, don't watch sports, and don't listen to the rock band. Google should know this with my location, search and YouTube history but it apparently doesn't care. What do you get?

[+] joshuahaglund|2 years ago|reply
Aren't they both doing the economically incentivized thing tho? Are you saying maybe some things should be beyond economic incentives?
[+] digitcatphd|2 years ago|reply
This is interesting, as I am actually doing the same thing with a site I have as I noticed my crawl budget has gotten less especially this year and fewer new articles are being indexed.

I suspect this is a long-term play for Google to phase out this search and replace with Bard. Think about it all these articles are doing now is writing a verbose version of what Bard gives you directly unless it’s new human content.

Google has in essence stolen all their information by scraping and storing in a database for its LLM and is offering its knowledge of this directly to users, so in a way, this is akin to Amazon selling its own private label products.

[+] raxxorraxor|2 years ago|reply
An article about reduced quality was pretty popular on HN a few years ago, that Google results looks like ads. But I believe we have hit a new low recently. Perhaps that is true for the overall quality of publications on the net. The amount or either approved news sites without significant content or outright click farms is immense. Even for topics that should net results. A news site filter would already help a lot, but even then the search seems to only react on buzzwords. Sometimes even terms you didn't search at all that were often associated with said buzzwords.
[+] ahmedfromtunis|2 years ago|reply
Even if we pretend for a moment that your statement, that google's search is "shitty", is universally accepted as truth, you can't blame this one on Google.

People have been committing horrifying atrocities in the name of SEO for years. I've seen it firsthand. And it spectacularly backfired each time.

This can very probably be yet another one of such cases.

[+] neycoda|2 years ago|reply
Can Google tell the difference between old relevant information and old irrelevant (or outdated) information? I'm not seeing any evidence of that. A search engine is not a subject matter expert of everything on the Internet, and it shouldn't be.
[+] DocTomoe|2 years ago|reply
In all fairness, there is some old information I would love to disappear eventually. Nothing quite as frustrating than having a question and all the tutorials are for a version of a software that is 15 years old and behaves completely different than the new one.
[+] dumpsterdiver|2 years ago|reply
To be fair, the old internet wasn't killed. It just passed away. The curious voices that were prevalent back then are buried now.
[+] 6gvONxR4sf7o|2 years ago|reply
I don’t see why google gets the blame when spam is what forces google to search how it searches.
[+] throw10920|2 years ago|reply
> Can there be any doubt

This is such terrible, low-quality, manipulative content. It does not belong on HN.

[+] ransom1538|2 years ago|reply
I stopped using search, I have switched 90% over to chatgpt.
[+] jzb|2 years ago|reply
This is some bullshit. It’s bad enough that a lot of sites with content going back 10-20 years have linkrot or have simply gone offline. But I am at a loss for words that they’re disappearing content on purpose just for SEO rankings.

If this is what online publishing has come to we have seriously screwed up.

[+] kenjackson|2 years ago|reply
In fairness to them. If one of their top ways of getting traffic is being marked as less relevant because of older articles, what do you want them to do? Just continue to lose money because Google can't rank them appropriately?
[+] andirk|2 years ago|reply
A really awesome on-the-ground website for information about specifics in collectibles was allexperts.com (now defunct). They got bought by something about.com who then got bought by a Thought Company? and they straight up deleted all of the 10+ years of people asking common questions about collectibles and getting solid answers. Poof.
[+] barbariangrunge|2 years ago|reply
The real reason: hiding it from ai scrapers?
[+] idonotknowwhy|2 years ago|reply
The crtgaming community hate that a few years ago, they deleted all the specs for old CRT monitors :(
[+] MrVandemar|2 years ago|reply
Librians regularly cull books. Magazines are cleared off the newstands weekly or monthly.

Storing and indexing and maintaining old content isn't free, in either dollars or environmental footprint.

[+] chaostheory|2 years ago|reply
Is this why Google search is really terrible now? The search results are nearly unusable today compared to 1-2 years ago
[+] hindsightbias|2 years ago|reply
The only History that exists today is that which doesn’t end with a 404.

We might as well burn the libraries, they serve no modern purpose.

[+] crazygringo|2 years ago|reply
Is there any evidence this would even work?

Surely Google determines "fresh, relevant" content according to whatever has recently been published, which this doesn't change. If anything, doesn't Google consider sites with a long history of content with tons of inbound links as more authoritative and therefore higher-ranked?

This baffles me. It baffles me why this would be successful SEO -- and assuming that it actually isn't, it baffles me why CNET thinks it would be.

[+] burnhamup|2 years ago|reply
The theory I've heard is related to 'crawl budget'. Google is only going to devote a finite amount of time to indexing your site. If the number of articles on your site exceeds that time, some portion of your site won't be indexed. So by 'pruning' undesirable pages, you might boost attention on the articles you want indexed. No clue how this ends up working in practice.

Google's suggestion isn't to delete pages, but maybe mark some pages with a no index header.

https://developers.google.com/search/docs/crawling-indexing/...

[+] snowwrestler|2 years ago|reply
In a major site redesign a couple years ago, we dropped 3/4 of our old URLs, and saw a big improvement in SEO metrics.

I know it doesn’t make sense and that Google says it is not necessary. But it clearly worked for us.

I think a fundamental truth about Google Search is that no one understands how it actually works anymore, including Google. They announce search algorithm updates with specific goals… and then silently roll out tweaks, more updates, etc. when the predicted effect doesn’t show up.

I think the idea that Google is in control and all the SEOs are just guessing, is wrong. I think it’s become a complex enough ML system that now all anyone can do is observe and adjust, including Google.

[+] laweijfmvo|2 years ago|reply
I have noticed some articles (and not just "Best XXX of 202Y" articles) that seem to always update their "Updated on" date which Google unhelpfully picks up and shows in search results leading me to think the page is much more recent than it is.
[+] meragrin_|2 years ago|reply
> It baffles me why this would be successful SEO -- and assuming that it actually isn't, it baffles me why CNET thinks it would be.

If the content deleted is garbage, why wouldn't it help? No clue on CNET's overall quality, but I don't have a favorable image of it. Just had a look at their main page and that did not do it any favors.

[+] ReflectedImage|2 years ago|reply
Several reasons why it works. First is the page rank algorithm will give the other pages on the site a higher score. Per the spec.

Second is there could be spam links pointing to old CNET articles that need to be wiped from CNETs site spam score.

[+] SoftTalker|2 years ago|reply
Perhaps sites with a small ratio of new:total content would be downranked --- but I really don't think that makes sense because that's going to be the case for any long-established site.
[+] codedokode|2 years ago|reply
Google also might be at fault for making images on web lower quality. Several years ago, Google had announced that page load speed will affect the ranking. Google's tool, PageSpeed Insights gave recommendations on improving load speed. But it also recommended to lower quality of JPEG images to the level where artifacts would be visible. So instead of proper manual testing (using eyes, not a mathematical formula) on a large set of images, some Google employee simply wrote a recommended compression level out of their head and this forced web masters to worsen the quality of the images below any acceptable level.

So it doesn't matter if the photographer or illustrator worked hard to make a beautiful image, Google's robotic decision based on some lifeless mathematical formula crossed out their efforts.

[+] petee|2 years ago|reply
I've seen a couple news sources that are altering their publish dates to show near the top of news feeds. Google will announce "3 hours old", despite being weeks old.
[+] robertkeizer|2 years ago|reply
This is why I work at archive.org. Is it perfect? No. Does it have value to society? Absolutely.
[+] jprd|2 years ago|reply
Bad. Antithetical to both Google's original ideals and the early 'netzien goals.

Google's deteriorating performance shouldn't result in deleting valuable historical viewpoints, journalistic trends and research material just to raise your newly AI-generated sh1t to the top of the trash fire.

[+] nologic01|2 years ago|reply
Imagine, if you will, an utopic world where a critical service such as finding anything is not dominated by one (1) entity but an actual number - such as ten (10). Sci-fi novels describe this hypothetical market structure as a "competitive market".

In this utopic arrangement, users of search services are more-or-less evenly distributed among different search providers, enjoying a variety of different takes on how to find stuff.

Search providers, continues the sci-fi imagination, keep innovating and differentiating themselves to keep an edge over competition and please their users.

Producers of content, one the other hand, cannot assume much about what the inventive and aggressively competitive group of search providers will accentuate to please their users. So they focus on... improving the quality of their content, which is what they do best anyway.

Its a win-win for users and content producers. Alas, search service providers have to actually work for their money. This is slightly discomforting to a few, but not the end of the world.

One cannot but admire the imagination of such authors. What bizarre universes they keep inventing.

[+] NelsonMinar|2 years ago|reply
Since everyone's reacting to the headline and hasn't read The Fine Article... Please let me call attention to the linked tweet from Google explicitly saying don't do this.

> Are you deleting content from your site because you somehow believe Google doesn't like "old" content? That's not a thing! Our guidance doesn't encourage this. Older content can still be helpful, too.

https://twitter.com/searchliaison/status/1689018769782476800

[+] chaos986|2 years ago|reply
SEO is a scam run by con artists. Google's worth as a search engine is it's ability to rank pages by quality. SEO tries to fake quality or trick Google to ranking objectively bad sites higher.

Red Ventures is trying to get CNET to be worse. with this and the AI written stories. Google should react by delisting all of CNET.

[+] hooby|2 years ago|reply
Something is going very wrong here.

Information on the internet should be in whatever format best suits the topic. The format that best serves the users looking for that information.

And search engines should learn to interpret that information and the various formats, in order to be able to best connect those searching information with those providing information. Yes, the search engine should adapt to the information and it's formats - not the other way round.

Instead we see "information" (or the AI-generated trite replacing it) adapt it's contents and format for search engines, in a bid to ultimately best serve advertisers. And search engines too adapt and change their algorithms to best serve advertisers.

As a result it becomes ever harder and harder for users to actually find the information they want in a format that works.

It's become so bad, that it's now more practicable to use advanced AI to filter out the actual information and re-format it, rather than go look for for it yourself.

As a human, you no longer want to use the web, and search... you want to have a bot that does that for you... because ultimately the space has become pretty hostile to humans.

[+] Moldoteck|2 years ago|reply
I don't get it, why they just don't update the entries to disallow googlebot parsing those links, this way it'll be removed for google but accessible for others
[+] pyuser583|2 years ago|reply
This is a great example of the harms of AI.

Google presumably used AI to rank pages (it hasn’t been just PageRank for a while).

The AI has noticed people don’t engage with older content, so it deranks older content. It also deranks websites with lots of older content.

So websites pull their older content , which is an important form of historical memory.

Even if the AI isn’t actually doing this, people assume it is.

Because AIs aren’t rules based, we have to guess what it’s doing.

And we guess it’s deranking old sites.

[+] scrame|2 years ago|reply
cnet got bought by vulture capital a couple years back and already had been replacing writers with crap AI before chatGPT was a thing. this shouldn't be a surprise. everything CNET related has been a walking corpse for a while now.
[+] Roark66|2 years ago|reply
So, let me get this right. CNET started using "generative AI" to write their articles. Google no doubt detected it and down ranked them to hell. CNET stopped the AI generation and they decided to delete their archives to improve their rankings?
[+] userbinator|2 years ago|reply
Newer is not always better, especially when you're looking for information on old things, but I suspect there are vested interests who don't want us to remember how much better things were in the past, so they can continue to espouse their illusion of "progress", and this "cleaning up" of old information is contributing to that goal.

Archive.org deserves all the support it needs. If only the Wayback Machine was actually indexed and searchable too...

[+] ilrwbwrkhv|2 years ago|reply
One of my favourite websites growing up: download.com
[+] rdiddly|2 years ago|reply
This is like the flea on the dog's tail, wagging the tail and dog. I don't know how many more levels of meta we can handle.

Also, if you just dump all that content on archive.org, you're kind of just reaching into archive.org's wallet, pulling out dollar bills, and giving them to Google, whose ostensible goal was to index and make available all the world's information. I feel like that's enough irony and internet for today.

[+] epakai|2 years ago|reply
CNET did this a while back, but it didn't seem SEO related then. They used to have tons of old tech specs. I remember them being the last source of specs for an obscure managed switch. Then the whole of that data just went away with no notice. Really great resource lost.