Tell HN: A case of negative SEO I caught on my service and how I dealt with it
278 points| santah | 5 years ago
For example, about 2 years ago, something similar happened. While digging through my Search Console I discovered that Russian websites generated thousands of links pointing to a page on Next Episode with pornographic keywords used as link anchors. This was so effective that they managed to get those keywords to the top of the "Top linking text" in Google Search Console - naturally (most likely) resulting in drop in rankings for the regular keywords and the domain in general.
About a week ago, while trying to investigate the current drop in rankings and browsing through my "Latest links" external links export from Google Search Console, I noticed something funny. There were thousands of links in there (from 3 domains) following the same structure as on Next Episode: domain/show-name domain/show-name/browse domain/show-name/season-1, etc.
Following these links revealed something even funnier: all of them displayed content directly from my site! Not even scraped/cached content - they were dynamically pulling content from my server and displaying it on their domain. Even the search worked, the news archive and the top charts. Here is a list of those domains as an image: https://i.imgur.com/PjNKh0b.png. I've since blocked their access, so opening any of them will not show my website right now, but here is how it looked: https://i.imgur.com/HBiL3yh.png
Now, my first thought was that those were maybe scraping the content as part of a link farm (to spam with ads?), but I also wanted to know more. I experimented with Google searches that included pages from my website, like "Hot Shows - Next Episode" and ones with very specific news posts subjects like "Streaming Services Availability added to Episodes and Movies" (posted in September last year). Imagine my surprise when I discovered that not only the domains above were indexed by Google (and were listed in the Search results), but there were 4-5 more domains that did the same thing and some of them even outranked mine!
Here is a full list of domains that I discovered by searching for my news posts subjects: https://i.imgur.com/dAm1CzI.png. If you Google for site:domain.com you'll see some of them have thousands of pages indexed by Google. Trying out more keyword searches, I was also able to discover these domains: https://i.imgur.com/s5YjJWK.png (as they've cached the content, they still work). Those all seem to be part of the same operation, but they serve a different purpose - they have only scraped the home page of Next Episode and all their links point to inside pages on the other domains. I suspect this is to generate incoming links to the other domains and give them some credibility.
As with the links with adult keywords text anchors mentioned above - I suspect this whole thing is a negative SEO campaign - I don't see any other reason for it to be happening and it seems to be achieving its goal. Once I found all I could find about the domains involved in this, I took some action:
1) disavowed all those domains through the Google disavow tool
2) investigated if I could redirect their pages to mine (as they were dynamically pulling the content - I could change it to whatever I wanted). I managed to make it work through JavaScript (though interestingly, it had to be obfuscated as they were doing some sanitizing when pulling my content and replacing strings like "window.location.href" with "window.loc1ion.href"), but in the end I decided against it and:
3) I blocked their IPs through CloudFlare (all Russian IPs). An interesting thing here is that once I blocked an IP, the domain would somehow automatically switch to another IP to pull my content from, but once I blocked like 10 or 15 of them - they seem to have run out of IPs and now they stay blocked.
I looked for a way to report those domains to Google, but as of today, I've not found the place to do it. Does anybody know? Today, about a week after I blocked the domains that pulled content from my site, they still have thousands of my pages indexed in Google and are ranking better in some search results than me. I'm guessing with time, Google will catch up with the fact they don't show any content anymore and will delist those pages.
This whole thing was very new to me so I hope it'll raise awareness that this is going on and maybe help someone else catch it happening to their website. I'd appreciate any feedback on this and I'm around if you have any questions. It would also be interesting to hear about anyone's related experiences. Cheers!
[+] [-] Matsta|5 years ago|reply
Google's algorithm is smart enough to recognise Neg SEO attacks. Sure five years ago you could buy a blast of spammy links using Xrumer or GSA with some viagra anchor text and boom you're competition is gone.
From a quick glance, most of your pages have pretty thin content, and I assume it's pulling from an API, so none of it is unique. If there was one thing I would do is try to build some content on pages. A great tool to analyse and develop content that is SEO friendly is SurferSEO - highly recommend it.
I'm surprised your forum doesn't rank as well as your main site as it looks fairly active. However, I'm not sure about how PunBB does SEO wise.
[+] [-] austhrow743|5 years ago|reply
I distinctly remember 8 years ago dealing with negative SEO and reading the same thing everywhere while researching it. "Negative SEO used to work in the olden days of the internet, but now in the modern era Google is all over it."
I wonder if in 5 years people will be admitting that negative SEO worked 5 years ago.
[+] [-] markdown|5 years ago|reply
Your advice will help OP, but it's sad to see it on here.
Adding content to sites that don't need it ruins the experience. Don't build for the bot, build for humans.
SEO is a cancer on the web. A few days ago Google directed me to a recipe for a particular type of bread. I swear I had to scroll through the authors entire life story... how their grandmother handed down this recipe from her grandmothers mother, how it feels to make you own bread, how to save your sanity with bread, how best to store bread.
The recipe at the very bottom of the page could have fit neatly above the fold.
Here are two options for OP, either of which will improve his website, unlike your suggestion: https://i.imgur.com/9VlBguW.png
[+] [-] santah|5 years ago|reply
I'm aware of the algo update that happened in December (and that it correlated very closely with my drop in rankings).
However, in my experience - even though jumps or drops in ranking are almost always triggered by such updates - there is a good reason for it to happen.
But you're right, what is happening here may've also had no effect whatsoever.
In any case - I found it because I dug up to try and what was going on and I can't explain what I found in any other way.
Even if it didn't affect ranking - does it look like negative SEO to you or do you think something else is going on here?
[+] [-] tyingq|5 years ago|reply
This is mysterious to me. That bad links hurt your site unless someone else bought them. Google's smart and all, but?
[+] [-] melomal|5 years ago|reply
Right now getting links indexed is a 15 day+ affair for some, some have no luck at all. From search results that I have been getting, it's almost like nothing makes sense anymore. Pure garbage content is at the top, or worst of all content from 3+ years ago and content freshness is considered a key ranking factor.
[+] [-] santah|5 years ago|reply
The thing is, I have "nofollow"-ed all links to the forum from the main site. Honestly, at this point - I don't even remember why I did it, it must've been close to 10 years ago.
Now this makes me think if I should remove those nofollows ...
[+] [-] javajosh|5 years ago|reply
Also, given the way they were using your site, effectively reverse-proxying you and adding ads, it implies that you have access, in your server logs at least, to all of their traffic! And that might give you insight into their motivations, and maybe other elements of their operations. I mean, it sounds like a reasonably clever, small scale scam operation in Russia; but if they proved out the technique with your niche site, then they can easily duplicate with other sites, in which case it is effectively a new kind of malware that has to be solved by Google!
Last but not least, I wanted to encourage you, and others, to consider whether this kind of attack would work in a decentralized world, what search looks like in that world, and therefore how this kind of attack might be mitigated.
[+] [-] santah|5 years ago|reply
[+] [-] santah|5 years ago|reply
Apparently, they expanded their pool of available IPs they pull data from and now they seem to be endless (so some of the scraping domains actually work now).
I'm investigating what I can do about it. I'd appreciate any advice!
[+] [-] santah|5 years ago|reply
I wonder how banning so many IPs affects CloudFlare performance and if I should optimize it to block whole IP ranges instead ...
[+] [-] speedgoose|5 years ago|reply
You could look at the http request headers and perhaps identify the scrapper script. You could also put a javascript challenge that is required to solve before pulling more data, and disable it for Google and Bing ips, so it's more work for them to pull data for some time.
Instead of simply blocking, you could detect them and do some kind of http slowloris response.
[+] [-] markhowe|5 years ago|reply
As an aside, I’ve fought credential stuffers by returning real looking but actually false data, and initiating password resets... start serving different data on each hit, you may need to be annoying enough that they give up.
[+] [-] eps|5 years ago|reply
[+] [-] arn|5 years ago|reply
You have a very straight forward value prop. "Next episode" of some-show. I think these sort of optimized results are probably things that Google has been algorithmically adjusting for.
Looking at the Ranked 1-3 terms you dropped for, it seems you dropped some pretty big terms and even keyword terms.
You were #1 for "seal team next episode", but now you rank #3. #1 got replaced by CBS's page, which is arguably a better result.
"black clover new episode" also dropped from #1. Replaced by Wikipedia.
"the good place next episode" similar story.
I don't know what the best move is here. Algorithmic changes are really hard to combat without major changes and even then, you don't have a ton of room to wiggle with next-episode content.
[+] [-] throwaway13337|5 years ago|reply
Looking at Google's search results, it's obvious that these tactics are rampant and really winning the war here.
We need a new search engine that cannot be gamed so easily. I know it's non trivial but the stakes are high as is the reward for making such.
This is a real engineering challenge. I'm excited about the problem space and opportunity.
[+] [-] DylanDmitri|5 years ago|reply
Some smaller scope things can be made completely watertight, for example mathematically proven cryptography, but even that often leaks to government pressure.
[+] [-] post_below|5 years ago|reply
So you mean a search engine that's 100% human curated? Or rather, a directory, it wouldn't really be a search engine.
Any algorithmic signal can be gamed. Although I'd be curious to hear how I'm wrong about that.
[+] [-] danhak|5 years ago|reply
[+] [-] judge2020|5 years ago|reply
[+] [-] pilferz|5 years ago|reply
1. Compile a list of domains and sitemaps that are 100% stealing and mirroring your content.
2. Go to Google's DMCA request page: https://www.google.com/webmasters/tools/legal-removal-reques...
3. Fill out all relevant data, and submit the offending domains and URL's.
Wait a few days, and you'll be happy to see that those pages are blocked from Google entirely. Not many people know what to do when Google DMCA's them, so it could solve your problem permanently (or you can automate it).
Regarding physically blocking them from scraping your site, you've got a few options. Put Cloudflare up if it isn't already. They've got at least one anti-scraping application (Scrape Shield) that may help.
Another thing you can do is automate the scraping of their websites using distinct query parameters and try to exhaust their list of proxies by automatically logging and filtering them. This might be a fruitless endeavor if they're using rotating residential proxies though.
Hope this helps, and good luck!
[+] [-] santah|5 years ago|reply
I've added filing a DMCA removal request to Google to the list of things to do if this continues ...
The rest of what you mentioned we've discussed in prior comments and are indeed helpful in mitigating this.
[+] [-] juanani|5 years ago|reply
Didn't think I'd see the author but since you're here, thanks, this has been my go-to over the years.
[+] [-] santah|5 years ago|reply
[+] [-] clscott|5 years ago|reply
Would you be able to make them do the same negative SEOing but to their own site?
Fill their site with unrelated garbage and internal links with undesirable anchor text.
* unbock their IP * create content that links back to their site with the undesirable keywords * only show this content to them and not regular visitors * don't let them grab much / any legitimate content
[+] [-] santah|5 years ago|reply
I can either fill their content with pornographic stuff or simply redirect it to adult websites.
I think if that happens - Google will de-rank and de-list them super quickly.
I chose not to, because I'm not sure if it's not going to affect real people ...
[+] [-] stickfigure|5 years ago|reply
https://news.ycombinator.com/item?id=26104087
Your YC application practically writes itself.
[+] [-] melomal|5 years ago|reply
[+] [-] slig|5 years ago|reply
[+] [-] stanislavb|5 years ago|reply
[+] [-] ThisIsMeEEE|5 years ago|reply
[+] [-] tester34|5 years ago|reply
That happens when it is legal to hack/steal/cause damage to people from other countries
[+] [-] devlopr|5 years ago|reply
[+] [-] thetinguy|5 years ago|reply
[+] [-] BryanBigs|5 years ago|reply
[+] [-] santah|5 years ago|reply
Do you see anything wrong?