top | item 26105890

Tell HN: A case of negative SEO I caught on my service and how I dealt with it

278 points| santah | 5 years ago

Recently, my service https://next-episode.net experienced a huge drop in Google rankings. As I've been running it for more than 15 years, this is far from the first time this has happened. Usually I've been able to attribute big fluctuations (positive or negative) either to something I did, a Google algo change, or some external factor.

For example, about 2 years ago, something similar happened. While digging through my Search Console I discovered that Russian websites generated thousands of links pointing to a page on Next Episode with pornographic keywords used as link anchors. This was so effective that they managed to get those keywords to the top of the "Top linking text" in Google Search Console - naturally (most likely) resulting in drop in rankings for the regular keywords and the domain in general.

About a week ago, while trying to investigate the current drop in rankings and browsing through my "Latest links" external links export from Google Search Console, I noticed something funny. There were thousands of links in there (from 3 domains) following the same structure as on Next Episode: domain/show-name domain/show-name/browse domain/show-name/season-1, etc.

Following these links revealed something even funnier: all of them displayed content directly from my site! Not even scraped/cached content - they were dynamically pulling content from my server and displaying it on their domain. Even the search worked, the news archive and the top charts. Here is a list of those domains as an image: https://i.imgur.com/PjNKh0b.png. I've since blocked their access, so opening any of them will not show my website right now, but here is how it looked: https://i.imgur.com/HBiL3yh.png

Now, my first thought was that those were maybe scraping the content as part of a link farm (to spam with ads?), but I also wanted to know more. I experimented with Google searches that included pages from my website, like "Hot Shows - Next Episode" and ones with very specific news posts subjects like "Streaming Services Availability added to Episodes and Movies" (posted in September last year). Imagine my surprise when I discovered that not only the domains above were indexed by Google (and were listed in the Search results), but there were 4-5 more domains that did the same thing and some of them even outranked mine!

Here is a full list of domains that I discovered by searching for my news posts subjects: https://i.imgur.com/dAm1CzI.png. If you Google for site:domain.com you'll see some of them have thousands of pages indexed by Google. Trying out more keyword searches, I was also able to discover these domains: https://i.imgur.com/s5YjJWK.png (as they've cached the content, they still work). Those all seem to be part of the same operation, but they serve a different purpose - they have only scraped the home page of Next Episode and all their links point to inside pages on the other domains. I suspect this is to generate incoming links to the other domains and give them some credibility.

As with the links with adult keywords text anchors mentioned above - I suspect this whole thing is a negative SEO campaign - I don't see any other reason for it to be happening and it seems to be achieving its goal. Once I found all I could find about the domains involved in this, I took some action:

1) disavowed all those domains through the Google disavow tool

2) investigated if I could redirect their pages to mine (as they were dynamically pulling the content - I could change it to whatever I wanted). I managed to make it work through JavaScript (though interestingly, it had to be obfuscated as they were doing some sanitizing when pulling my content and replacing strings like "window.location.href" with "window.loc1ion.href"), but in the end I decided against it and:

3) I blocked their IPs through CloudFlare (all Russian IPs). An interesting thing here is that once I blocked an IP, the domain would somehow automatically switch to another IP to pull my content from, but once I blocked like 10 or 15 of them - they seem to have run out of IPs and now they stay blocked.

I looked for a way to report those domains to Google, but as of today, I've not found the place to do it. Does anybody know? Today, about a week after I blocked the domains that pulled content from my site, they still have thousands of my pages indexed in Google and are ranking better in some search results than me. I'm guessing with time, Google will catch up with the fact they don't show any content anymore and will delist those pages.

This whole thing was very new to me so I hope it'll raise awareness that this is going on and maybe help someone else catch it happening to their website. I'd appreciate any feedback on this and I'm around if you have any questions. It would also be interesting to hear about anyone's related experiences. Cheers!

88 comments

[+] Matsta|5 years ago|reply

I'm sorry to say, but the neg SEO didn't drop your rankings, it was to do with the Google algorithm update [1]. Check the screenshot from Ahrefs [2], and your traffic drops on 3rd of December which is when the update went live. [1] https://moz.com/blog/googles-december-2020-core-update [2] https://i.imgur.com/DBkdUEk.png

Google's algorithm is smart enough to recognise Neg SEO attacks. Sure five years ago you could buy a blast of spammy links using Xrumer or GSA with some viagra anchor text and boom you're competition is gone.

From a quick glance, most of your pages have pretty thin content, and I assume it's pulling from an API, so none of it is unique. If there was one thing I would do is try to build some content on pages. A great tool to analyse and develop content that is SEO friendly is SurferSEO - highly recommend it.

I'm surprised your forum doesn't rank as well as your main site as it looks fairly active. However, I'm not sure about how PunBB does SEO wise.

[+] austhrow743|5 years ago|reply

>Google's algorithm is smart enough to recognise Neg SEO attacks. Sure five years ago you could buy a blast of spammy links using Xrumer or GSA with some viagra anchor text and boom you're competition is gone.

I distinctly remember 8 years ago dealing with negative SEO and reading the same thing everywhere while researching it. "Negative SEO used to work in the olden days of the internet, but now in the modern era Google is all over it."

I wonder if in 5 years people will be admitting that negative SEO worked 5 years ago.

[+] markdown|5 years ago|reply

> From a quick glance, most of your pages have pretty thin content, and I assume it's pulling from an API, so none of it is unique. If there was one thing I would do is try to build some content on pages.

Your advice will help OP, but it's sad to see it on here.

Adding content to sites that don't need it ruins the experience. Don't build for the bot, build for humans.

SEO is a cancer on the web. A few days ago Google directed me to a recipe for a particular type of bread. I swear I had to scroll through the authors entire life story... how their grandmother handed down this recipe from her grandmothers mother, how it feels to make you own bread, how to save your sanity with bread, how best to store bread.

The recipe at the very bottom of the page could have fit neatly above the fold.

Here are two options for OP, either of which will improve his website, unlike your suggestion: https://i.imgur.com/9VlBguW.png

[+] santah|5 years ago|reply

Hey there and thanks for looking into it.

I'm aware of the algo update that happened in December (and that it correlated very closely with my drop in rankings).

However, in my experience - even though jumps or drops in ranking are almost always triggered by such updates - there is a good reason for it to happen.

But you're right, what is happening here may've also had no effect whatsoever.

In any case - I found it because I dug up to try and what was going on and I can't explain what I found in any other way.

Even if it didn't affect ranking - does it look like negative SEO to you or do you think something else is going on here?

[+] tyingq|5 years ago|reply

"Google's algorithm is smart enough to recognise Neg SEO attacks"

This is mysterious to me. That bad links hurt your site unless someone else bought them. Google's smart and all, but?

[+] melomal|5 years ago|reply

This is it, and the Jan update and then the end of Jan update too. All of which has wiped a lot of traffic for others out there all with solid sites.

Right now getting links indexed is a 15 day+ affair for some, some have no luck at all. From search results that I have been getting, it's almost like nothing makes sense anymore. Pure garbage content is at the top, or worst of all content from 3+ years ago and content freshness is considered a key ranking factor.

[+] santah|5 years ago|reply

Btw, forgot to comment on your remark about the forum.

The thing is, I have "nofollow"-ed all links to the forum from the main site. Honestly, at this point - I don't even remember why I did it, it must've been close to 10 years ago.

Now this makes me think if I should remove those nofollows ...

[+] javajosh|5 years ago|reply

May I just say kudos, sir, for dealing with this situation with such aplomb. It is easy to imagine an alternative response, with far more anger and less curiosity. You are like a doctor looking at a disease: "Ah, look at this awful thing happening, how interesting!"

Also, given the way they were using your site, effectively reverse-proxying you and adding ads, it implies that you have access, in your server logs at least, to all of their traffic! And that might give you insight into their motivations, and maybe other elements of their operations. I mean, it sounds like a reasonably clever, small scale scam operation in Russia; but if they proved out the technique with your niche site, then they can easily duplicate with other sites, in which case it is effectively a new kind of malware that has to be solved by Google!

Last but not least, I wanted to encourage you, and others, to consider whether this kind of attack would work in a decentralized world, what search looks like in that world, and therefore how this kind of attack might be mitigated.

[+] santah|5 years ago|reply

Thanks for the kind words :) It is indeed both terrifying and super exciting and interesting to figure out what is going on (and how to stop it).

[+] santah|5 years ago|reply

Update: After a week of doing nothing - they finally noticed their thing is blocked and sprang into action.

Apparently, they expanded their pool of available IPs they pull data from and now they seem to be endless (so some of the scraping domains actually work now).

I'm investigating what I can do about it. I'd appreciate any advice!

[+] santah|5 years ago|reply

Update 2: After banning close to 2k individual IPs, it looks like I got it under control, for now.

I wonder how banning so many IPs affects CloudFlare performance and if I should optimize it to block whole IP ranges instead ...

[+] speedgoose|5 years ago|reply

Do they use a web browser for the scraping or simply a http library?

You could look at the http request headers and perhaps identify the scrapper script. You could also put a javascript challenge that is required to solve before pulling more data, and disable it for Google and Bing ips, so it's more work for them to pull data for some time.

Instead of simply blocking, you could detect them and do some kind of http slowloris response.

[+] markhowe|5 years ago|reply

Setup a honeypot page to log the ‘users’ IP. Keep hitting it via their domain and you’ll build up a list of IP’s to block?

As an aside, I’ve fought credential stuffers by returning real looking but actually false data, and initiating password resets... start serving different data on each hit, you may need to be annoying enough that they give up.

[+] eps|5 years ago|reply

They must've seen this post.

[+] arn|5 years ago|reply

Browsing through your SEO results. I also don't think the negative SEO is necessarily what did you in.

You have a very straight forward value prop. "Next episode" of some-show. I think these sort of optimized results are probably things that Google has been algorithmically adjusting for.

Looking at the Ranked 1-3 terms you dropped for, it seems you dropped some pretty big terms and even keyword terms.

You were #1 for "seal team next episode", but now you rank #3. #1 got replaced by CBS's page, which is arguably a better result.

"black clover new episode" also dropped from #1. Replaced by Wikipedia.

"the good place next episode" similar story.

I don't know what the best move is here. Algorithmic changes are really hard to combat without major changes and even then, you don't have a ton of room to wiggle with next-episode content.

[+] throwaway13337|5 years ago|reply

Wow. That's absolutely horrible.

Looking at Google's search results, it's obvious that these tactics are rampant and really winning the war here.

We need a new search engine that cannot be gamed so easily. I know it's non trivial but the stakes are high as is the reward for making such.

This is a real engineering challenge. I'm excited about the problem space and opportunity.

[+] DylanDmitri|5 years ago|reply

I think the real lesson is that under enough pressure every large system leaks. Anything that gatekeeps millions of real dollars (search engines, stock markets, Amazon reviews, insurance claims, etc) will constantly be exploited and patched by nature of the thing. Only "solution" is to decrease pressure, by say fragmenting market into 20+ search engines, so that SEO people can't realistically optimize for all of them at once.

Some smaller scope things can be made completely watertight, for example mathematically proven cryptography, but even that often leaks to government pressure.

[+] post_below|5 years ago|reply

"a new search engine that cannot be gamed"

So you mean a search engine that's 100% human curated? Or rather, a directory, it wouldn't really be a search engine.

Any algorithmic signal can be gamed. Although I'd be curious to hear how I'm wrong about that.

[+] danhak|5 years ago|reply

I don't understand the implicit assumption that a new search engine that reaches Google's scale will be more adept at curtailing abuse.

[+] judge2020|5 years ago|reply

Even though Google’s not bulletproof, I don’t think any search engine that indexes literally every page could be created to block all abuse.

[+] pilferz|5 years ago|reply

Made an account here just to make this comment: You're going to want to send DMCA notices to both the registrar(s) AND Google.

1. Compile a list of domains and sitemaps that are 100% stealing and mirroring your content.

2. Go to Google's DMCA request page: https://www.google.com/webmasters/tools/legal-removal-reques...

3. Fill out all relevant data, and submit the offending domains and URL's.

Wait a few days, and you'll be happy to see that those pages are blocked from Google entirely. Not many people know what to do when Google DMCA's them, so it could solve your problem permanently (or you can automate it).

Regarding physically blocking them from scraping your site, you've got a few options. Put Cloudflare up if it isn't already. They've got at least one anti-scraping application (Scrape Shield) that may help.

Another thing you can do is automate the scraping of their websites using distinct query parameters and try to exhaust their list of proxies by automatically logging and filtering them. This might be a fruitless endeavor if they're using rotating residential proxies though.

Hope this helps, and good luck!

[+] santah|5 years ago|reply

Thanks for the suggestions.

I've added filing a DMCA removal request to Google to the list of things to do if this continues ...

The rest of what you mentioned we've discussed in prior comments and are indeed helpful in mitigating this.

[+] juanani|5 years ago|reply

Sucks to see this. I think I even mentioned your site on here just this past week.

Didn't think I'd see the author but since you're here, thanks, this has been my go-to over the years.

[+] santah|5 years ago|reply

Ey thanks for taking the time to write this, very helpful during this stressful time :)

[+] clscott|5 years ago|reply

It's an interesting story, I won'der if you could turn the automation trick around on them.

Would you be able to make them do the same negative SEOing but to their own site?

Fill their site with unrelated garbage and internal links with undesirable anchor text.

* unbock their IP * create content that links back to their site with the undesirable keywords * only show this content to them and not regular visitors * don't let them grab much / any legitimate content

[+] santah|5 years ago|reply

I think I can do this easily yes.

I can either fill their content with pornographic stuff or simply redirect it to adult websites.

I think if that happens - Google will de-rank and de-list them super quickly.

I chose not to, because I'm not sure if it's not going to affect real people ...

[+] stickfigure|5 years ago|reply

Earlier today in another thread we were joking about GaaS (Goatse as a Service) but now maybe I think that's not so crazy an idea after all.

https://news.ycombinator.com/item?id=26104087

Your YC application practically writes itself.

[+] melomal|5 years ago|reply

Also, another update has been noticed for V-day celebrations: https://www.seroundtable.com/google-search-ranking-algorithm...

[+] slig|5 years ago|reply

This is really frustrating, thanks for sharing. Google has had decades to figure out a way to detect duplicated content, spammy sites with this structure random-spam-keyword.spam-site.xyz/more-spam-words.html and the problem seems to get worse every year.

[+] stanislavb|5 years ago|reply

I feel you. First how bad that feels and second, the amount of time you need to spend in fighting these things :/

[+] ThisIsMeEEE|5 years ago|reply

Yeah, It's frustrating though..

[+] tester34|5 years ago|reply

Of course Russians

That happens when it is legal to hack/steal/cause damage to people from other countries

[+] devlopr|5 years ago|reply

Is all of your content pulled via javascript? Could a server side language prerendering the content be part of your solution. You can still use javascript for everything else just not the content.