So google's shitty search now economically incentivizes sites to destroy information.
Can there be any doubt that Google destroyed the old internet by becoming a bad search engine? Could their exclusion of most of the web be considered punishment for being sites being so old and stable that they don't rely on Google for ad revenue?
I'll just assume you neglected to read TFA, because if you had, you would have discovered that it links to an official Google source that states CNET shouldn't be doing this.[1]
Did you notice that nowadays a lot of websites have a lot of uninteresting drivel giving a "background" to whatever the thing was you were searching for before you get to read (hopefully) the thing you were searching for?
People discovered that Google measures not only how much time you stay on a webpage but also how much you scroll to define how interesting a website is. So now every crappy "tech tips" website that has an answer that fits in a short paragraph now makes you scroll two pages before you get the thing you actually wanted to read.
Before someone suggests a new search engine where the ranking algorithm is replaced with AI, I would like to propose a return to human-curated directories. Yahoo had one, and for a while, so did Google. It was pre-social-media and pre-wiki, so none of these directories were optimized to take advantage of crowdsourcing. Perhaps it's time to try again?
I feel like they presume I'm a gullible person they need to protect who is just on the Internet for shopping and watching entertainment.
Wasn't the point of them tracking us so much to customize and cater our results? Why have they normalized everything to some focus group persona of Joe six-pack?
***
Let's try an experiment
Type in "Chicago ticket" which has at least 4 interpretations, a ticket to travel to Chicago, a citation received in Chicago, A ticket to see the musical Chicago and a ticket to see the rock band Chicago.
For me I get the Rock band, citation, baseball and mass transit ticket in that order.
I'm in Los Angeles, have never been to Chicago, don't watch sports, and don't listen to the rock band. Google should know this with my location, search and YouTube history but it apparently doesn't care. What do you get?
This is interesting, as I am actually doing the same thing with a site I have as I noticed my crawl budget has gotten less especially this year and fewer new articles are being indexed.
I suspect this is a long-term play for Google to phase out this search and replace with Bard. Think about it all these articles are doing now is writing a verbose version of what Bard gives you directly unless it’s new human content.
Google has in essence stolen all their information by scraping and storing in a database for its LLM and is offering its knowledge of this directly to users, so in a way, this is akin to Amazon selling its own private label products.
An article about reduced quality was pretty popular on HN a few years ago, that Google results looks like ads. But I believe we have hit a new low recently. Perhaps that is true for the overall quality of publications on the net. The amount or either approved news sites without significant content or outright click farms is immense. Even for topics that should net results. A news site filter would already help a lot, but even then the search seems to only react on buzzwords. Sometimes even terms you didn't search at all that were often associated with said buzzwords.
Even if we pretend for a moment that your statement, that google's search is "shitty", is universally accepted as truth, you can't blame this one on Google.
People have been committing horrifying atrocities in the name of SEO for years. I've seen it firsthand. And it spectacularly backfired each time.
This can very probably be yet another one of such cases.
Can Google tell the difference between old relevant information and old irrelevant (or outdated) information? I'm not seeing any evidence of that. A search engine is not a subject matter expert of everything on the Internet, and it shouldn't be.
In all fairness, there is some old information I would love to disappear eventually. Nothing quite as frustrating than having a question and all the tutorials are for a version of a software that is 15 years old and behaves completely different than the new one.
This is some bullshit. It’s bad enough that a lot of sites with content going back 10-20 years have linkrot or have simply gone offline. But I am at a loss for words that they’re disappearing content on purpose just for SEO rankings.
If this is what online publishing has come to we have seriously screwed up.
In fairness to them. If one of their top ways of getting traffic is being marked as less relevant because of older articles, what do you want them to do? Just continue to lose money because Google can't rank them appropriately?
A really awesome on-the-ground website for information about specifics in collectibles was allexperts.com (now defunct). They got bought by something about.com who then got bought by a Thought Company? and they straight up deleted all of the 10+ years of people asking common questions about collectibles and getting solid answers. Poof.
Surely Google determines "fresh, relevant" content according to whatever has recently been published, which this doesn't change. If anything, doesn't Google consider sites with a long history of content with tons of inbound links as more authoritative and therefore higher-ranked?
This baffles me. It baffles me why this would be successful SEO -- and assuming that it actually isn't, it baffles me why CNET thinks it would be.
The theory I've heard is related to 'crawl budget'. Google is only going to devote a finite amount of time to indexing your site. If the number of articles on your site exceeds that time, some portion of your site won't be indexed. So by 'pruning' undesirable pages, you might boost attention on the articles you want indexed. No clue how this ends up working in practice.
Google's suggestion isn't to delete pages, but maybe mark some pages with a no index header.
In a major site redesign a couple years ago, we dropped 3/4 of our old URLs, and saw a big improvement in SEO metrics.
I know it doesn’t make sense and that Google says it is not necessary. But it clearly worked for us.
I think a fundamental truth about Google Search is that no one understands how it actually works anymore, including Google. They announce search algorithm updates with specific goals… and then silently roll out tweaks, more updates, etc. when the predicted effect doesn’t show up.
I think the idea that Google is in control and all the SEOs are just guessing, is wrong. I think it’s become a complex enough ML system that now all anyone can do is observe and adjust, including Google.
I have noticed some articles (and not just "Best XXX of 202Y" articles) that seem to always update their "Updated on" date which Google unhelpfully picks up and shows in search results leading me to think the page is much more recent than it is.
> It baffles me why this would be successful SEO -- and assuming that it actually isn't, it baffles me why CNET thinks it would be.
If the content deleted is garbage, why wouldn't it help? No clue on CNET's overall quality, but I don't have a favorable image of it. Just had a look at their main page and that did not do it any favors.
Perhaps sites with a small ratio of new:total content would be downranked --- but I really don't think that makes sense because that's going to be the case for any long-established site.
Google also might be at fault for making images on web lower quality. Several years ago, Google had announced that page load speed will affect the ranking. Google's tool, PageSpeed Insights gave recommendations on improving load speed. But it also recommended to lower quality of JPEG images to the level where artifacts would be visible. So instead of proper manual testing (using eyes, not a mathematical formula) on a large set of images, some Google employee simply wrote a recommended compression level out of their head and this forced web masters to worsen the quality of the images below any acceptable level.
So it doesn't matter if the photographer or illustrator worked hard to make a beautiful image, Google's robotic decision based on some lifeless mathematical formula crossed out their efforts.
I've seen a couple news sources that are altering their publish dates to show near the top of news feeds. Google will announce "3 hours old", despite being weeks old.
Bad. Antithetical to both Google's original ideals and the early 'netzien goals.
Google's deteriorating performance shouldn't result in deleting valuable historical viewpoints, journalistic trends and research material just to raise your newly AI-generated sh1t to the top of the trash fire.
Imagine, if you will, an utopic world where a critical service such as finding anything is not dominated by one (1) entity but an actual number - such as ten (10). Sci-fi novels describe this hypothetical market structure as a "competitive market".
In this utopic arrangement, users of search services are more-or-less evenly distributed among different search providers, enjoying a variety of different takes on how to find stuff.
Search providers, continues the sci-fi imagination, keep innovating and differentiating themselves to keep an edge over competition and please their users.
Producers of content, one the other hand, cannot assume much about what the inventive and aggressively competitive group of search providers will accentuate to please their users. So they focus on... improving the quality of their content, which is what they do best anyway.
Its a win-win for users and content producers. Alas, search service providers have to actually work for their money. This is slightly discomforting to a few, but not the end of the world.
One cannot but admire the imagination of such authors. What bizarre universes they keep inventing.
Since everyone's reacting to the headline and hasn't read The Fine Article... Please let me call attention to the linked tweet from Google explicitly saying don't do this.
> Are you deleting content from your site because you somehow believe Google doesn't like "old" content? That's not a thing! Our guidance doesn't encourage this. Older content can still be helpful, too.
SEO is a scam run by con artists. Google's worth as a search engine is it's ability to rank pages by quality. SEO tries to fake quality or trick Google to ranking objectively bad sites higher.
Red Ventures is trying to get CNET to be worse. with this and the AI written stories. Google should react by delisting all of CNET.
Information on the internet should be in whatever format best suits the topic. The format that best serves the users looking for that information.
And search engines should learn to interpret that information and the various formats, in order to be able to best connect those searching information with those providing information. Yes, the search engine should adapt to the information and it's formats - not the other way round.
Instead we see "information" (or the AI-generated trite replacing it) adapt it's contents and format for search engines, in a bid to ultimately best serve advertisers. And search engines too adapt and change their algorithms to best serve advertisers.
As a result it becomes ever harder and harder for users to actually find the information they want in a format that works.
It's become so bad, that it's now more practicable to use advanced AI to filter out the actual information and re-format it, rather than go look for for it yourself.
As a human, you no longer want to use the web, and search... you want to have a bot that does that for you... because ultimately the space has become pretty hostile to humans.
I don't get it, why they just don't update the entries to disallow googlebot parsing those links, this way it'll be removed for google but accessible for others
cnet got bought by vulture capital a couple years back and already had been replacing writers with crap AI before chatGPT was a thing. this shouldn't be a surprise. everything CNET related has been a walking corpse for a while now.
So, let me get this right. CNET started using "generative AI" to write their articles. Google no doubt detected it and down ranked them to hell. CNET stopped the AI generation and they decided to delete their archives to improve their rankings?
Newer is not always better, especially when you're looking for information on old things, but I suspect there are vested interests who don't want us to remember how much better things were in the past, so they can continue to espouse their illusion of "progress", and this "cleaning up" of old information is contributing to that goal.
Archive.org deserves all the support it needs. If only the Wayback Machine was actually indexed and searchable too...
This is like the flea on the dog's tail, wagging the tail and dog. I don't know how many more levels of meta we can handle.
Also, if you just dump all that content on archive.org, you're kind of just reaching into archive.org's wallet, pulling out dollar bills, and giving them to Google, whose ostensible goal was to index and make available all the world's information. I feel like that's enough irony and internet for today.
CNET did this a while back, but it didn't seem SEO related then. They used to have tons of old tech specs. I remember them being the last source of specs for an obscure managed switch. Then the whole of that data just went away with no notice. Really great resource lost.
[+] [-] pessimizer|2 years ago|reply
Can there be any doubt that Google destroyed the old internet by becoming a bad search engine? Could their exclusion of most of the web be considered punishment for being sites being so old and stable that they don't rely on Google for ad revenue?
[+] [-] bagacrap|2 years ago|reply
[1] https://twitter.com/searchliaison/status/1689018769782476800
[+] [-] dep_b|2 years ago|reply
People discovered that Google measures not only how much time you stay on a webpage but also how much you scroll to define how interesting a website is. So now every crappy "tech tips" website that has an answer that fits in a short paragraph now makes you scroll two pages before you get the thing you actually wanted to read.
[+] [-] dannysullivan|2 years ago|reply
We have no guidance telling publishers to get rid of "old" content. That's not something we've said. I shared this week that it is not something we recommend: https://twitter.com/searchliaison/status/1689018769782476800
This also documents the many times over the years we've also pushed back on this myth: https://www.seroundtable.com/google-dont-delete-older-helpfu...
[+] [-] demizer|2 years ago|reply
[+] [-] hliyan|2 years ago|reply
https://en.wikipedia.org/wiki/Google_Directory
[+] [-] kristopolous|2 years ago|reply
Wasn't the point of them tracking us so much to customize and cater our results? Why have they normalized everything to some focus group persona of Joe six-pack?
***
Let's try an experiment
Type in "Chicago ticket" which has at least 4 interpretations, a ticket to travel to Chicago, a citation received in Chicago, A ticket to see the musical Chicago and a ticket to see the rock band Chicago.
For me I get the Rock band, citation, baseball and mass transit ticket in that order.
I'm in Los Angeles, have never been to Chicago, don't watch sports, and don't listen to the rock band. Google should know this with my location, search and YouTube history but it apparently doesn't care. What do you get?
[+] [-] joshuahaglund|2 years ago|reply
[+] [-] digitcatphd|2 years ago|reply
I suspect this is a long-term play for Google to phase out this search and replace with Bard. Think about it all these articles are doing now is writing a verbose version of what Bard gives you directly unless it’s new human content.
Google has in essence stolen all their information by scraping and storing in a database for its LLM and is offering its knowledge of this directly to users, so in a way, this is akin to Amazon selling its own private label products.
[+] [-] raxxorraxor|2 years ago|reply
[+] [-] weird-eye-issue|2 years ago|reply
[+] [-] ahmedfromtunis|2 years ago|reply
People have been committing horrifying atrocities in the name of SEO for years. I've seen it firsthand. And it spectacularly backfired each time.
This can very probably be yet another one of such cases.
[+] [-] neycoda|2 years ago|reply
[+] [-] DocTomoe|2 years ago|reply
[+] [-] dumpsterdiver|2 years ago|reply
[+] [-] 6gvONxR4sf7o|2 years ago|reply
[+] [-] throw10920|2 years ago|reply
This is such terrible, low-quality, manipulative content. It does not belong on HN.
[+] [-] ransom1538|2 years ago|reply
[+] [-] jzb|2 years ago|reply
If this is what online publishing has come to we have seriously screwed up.
[+] [-] kenjackson|2 years ago|reply
[+] [-] andirk|2 years ago|reply
[+] [-] barbariangrunge|2 years ago|reply
[+] [-] idonotknowwhy|2 years ago|reply
[+] [-] MrVandemar|2 years ago|reply
Storing and indexing and maintaining old content isn't free, in either dollars or environmental footprint.
[+] [-] chaostheory|2 years ago|reply
[+] [-] hindsightbias|2 years ago|reply
We might as well burn the libraries, they serve no modern purpose.
[+] [-] crazygringo|2 years ago|reply
Surely Google determines "fresh, relevant" content according to whatever has recently been published, which this doesn't change. If anything, doesn't Google consider sites with a long history of content with tons of inbound links as more authoritative and therefore higher-ranked?
This baffles me. It baffles me why this would be successful SEO -- and assuming that it actually isn't, it baffles me why CNET thinks it would be.
[+] [-] burnhamup|2 years ago|reply
Google's suggestion isn't to delete pages, but maybe mark some pages with a no index header.
https://developers.google.com/search/docs/crawling-indexing/...
[+] [-] snowwrestler|2 years ago|reply
I know it doesn’t make sense and that Google says it is not necessary. But it clearly worked for us.
I think a fundamental truth about Google Search is that no one understands how it actually works anymore, including Google. They announce search algorithm updates with specific goals… and then silently roll out tweaks, more updates, etc. when the predicted effect doesn’t show up.
I think the idea that Google is in control and all the SEOs are just guessing, is wrong. I think it’s become a complex enough ML system that now all anyone can do is observe and adjust, including Google.
[+] [-] laweijfmvo|2 years ago|reply
[+] [-] meragrin_|2 years ago|reply
If the content deleted is garbage, why wouldn't it help? No clue on CNET's overall quality, but I don't have a favorable image of it. Just had a look at their main page and that did not do it any favors.
[+] [-] ReflectedImage|2 years ago|reply
Second is there could be spam links pointing to old CNET articles that need to be wiped from CNETs site spam score.
[+] [-] SoftTalker|2 years ago|reply
[+] [-] codedokode|2 years ago|reply
So it doesn't matter if the photographer or illustrator worked hard to make a beautiful image, Google's robotic decision based on some lifeless mathematical formula crossed out their efforts.
[+] [-] petee|2 years ago|reply
[+] [-] robertkeizer|2 years ago|reply
[+] [-] jprd|2 years ago|reply
Google's deteriorating performance shouldn't result in deleting valuable historical viewpoints, journalistic trends and research material just to raise your newly AI-generated sh1t to the top of the trash fire.
[+] [-] nologic01|2 years ago|reply
In this utopic arrangement, users of search services are more-or-less evenly distributed among different search providers, enjoying a variety of different takes on how to find stuff.
Search providers, continues the sci-fi imagination, keep innovating and differentiating themselves to keep an edge over competition and please their users.
Producers of content, one the other hand, cannot assume much about what the inventive and aggressively competitive group of search providers will accentuate to please their users. So they focus on... improving the quality of their content, which is what they do best anyway.
Its a win-win for users and content producers. Alas, search service providers have to actually work for their money. This is slightly discomforting to a few, but not the end of the world.
One cannot but admire the imagination of such authors. What bizarre universes they keep inventing.
[+] [-] NelsonMinar|2 years ago|reply
> Are you deleting content from your site because you somehow believe Google doesn't like "old" content? That's not a thing! Our guidance doesn't encourage this. Older content can still be helpful, too.
https://twitter.com/searchliaison/status/1689018769782476800
[+] [-] chaos986|2 years ago|reply
Red Ventures is trying to get CNET to be worse. with this and the AI written stories. Google should react by delisting all of CNET.
[+] [-] hooby|2 years ago|reply
Information on the internet should be in whatever format best suits the topic. The format that best serves the users looking for that information.
And search engines should learn to interpret that information and the various formats, in order to be able to best connect those searching information with those providing information. Yes, the search engine should adapt to the information and it's formats - not the other way round.
Instead we see "information" (or the AI-generated trite replacing it) adapt it's contents and format for search engines, in a bid to ultimately best serve advertisers. And search engines too adapt and change their algorithms to best serve advertisers.
As a result it becomes ever harder and harder for users to actually find the information they want in a format that works.
It's become so bad, that it's now more practicable to use advanced AI to filter out the actual information and re-format it, rather than go look for for it yourself.
As a human, you no longer want to use the web, and search... you want to have a bot that does that for you... because ultimately the space has become pretty hostile to humans.
[+] [-] skilled|2 years ago|reply
https://www.seroundtable.com/google-dont-delete-older-helpfu...
[+] [-] Moldoteck|2 years ago|reply
[+] [-] pyuser583|2 years ago|reply
Google presumably used AI to rank pages (it hasn’t been just PageRank for a while).
The AI has noticed people don’t engage with older content, so it deranks older content. It also deranks websites with lots of older content.
So websites pull their older content , which is an important form of historical memory.
Even if the AI isn’t actually doing this, people assume it is.
Because AIs aren’t rules based, we have to guess what it’s doing.
And we guess it’s deranking old sites.
[+] [-] scrame|2 years ago|reply
[+] [-] Roark66|2 years ago|reply
[+] [-] userbinator|2 years ago|reply
Archive.org deserves all the support it needs. If only the Wayback Machine was actually indexed and searchable too...
[+] [-] ilrwbwrkhv|2 years ago|reply
[+] [-] rdiddly|2 years ago|reply
Also, if you just dump all that content on archive.org, you're kind of just reaching into archive.org's wallet, pulling out dollar bills, and giving them to Google, whose ostensible goal was to index and make available all the world's information. I feel like that's enough irony and internet for today.
[+] [-] epakai|2 years ago|reply