Would have been dealt with even earlier, but our experiments with it (http://meta.stackoverflow.com/a/69032/130213) lead us to believe Google was starting to treat us as a link farm.
What we were seeing in our earlier attempt wasn't pages appearing lower in results (which could reasonably be expected in some cases as a result of the change; and we wouldn't care, 2nd on a search under an equally good resource is fine by us), but pages being removed entirely; almost universally pages that had newly un-nofollow-ed links to stores, or ad-laden sites (mostly legitimate posts, but some spam that stuck around for a bit before the community deleted it). So, "classified as link farm" seemed like the most likely problem. Google naturally won't tell you why your ranking drops, so it was (and remains) just an educated guess as to what happened.
So we stopped, putting it on the "wish we could, but reality doesn't let us"-list until Google reached out to us (among thousands of others, I'm sure) to change our nofollow practices. Google didn't describe any changes in their algorithm, but it seems reasonable to me that there would have been some tweaks around nofollow to accompany a new policy; again just an educated guess.
Basically, this is an old post complaining about a long since addressed concern that we had tried to address even earlier but ran into practical problems with.
While the exact details of our algorithm are secret by necessity, I will say that we've had to consider posts individually to prevent spammers from kiting a single account up to post nofollow-less spam content. People still try, it's kind of astounding how much spammers try (I suspect SEO's opaqueness cuts both ways here), but it doesn't work (well, you can get one link in your profile; but there's less SEO juice to pass and you are hard-capped at one, no matter how long it takes someone to delete your account).
Disclaimer: Stack Exchange employee, I was on all the relevant calls but has been a couple years so grain of salt and all that.
I'm not sure why this made HN 2 years after posting it, but I wanted to make a comment.
SO has implemented a way to remove the nofollow links, but it is way too strict, and probably only affects a very very very small percentage of answers. I'd bet less than 0.1%.
For example see this answer with 74 upvotes from a user with almost 100k reputation.
The links are to MSDN (which is probably not spam by definition) and to a quoted source on techbubbles.com.
http://stackoverflow.com/questions/2660355/net-4-0-has-a-new...
This was also the exact time of all the Google Panda tweaks and the fallout from Google starting to penalize creative commons scrapers of Stack Overflow content. There were a lot of balls in the air and only a few of them were ours, so we wanted to be careful.
When Google is 90% of your traffic, you REALLY REALLY REALLY do not want to get on their bad side, even accidentally, so I can hope you understand why we wanted to be cautious to the extreme here. If Google decides you're not doing things right, it is literally a business ending move.
Unless your VCs and investors are cool with you losing, y'know, ninety percent of your traffic.
I've seen this happen to other sites, something angers the Google algo and organic traffic falls off a cliff.
History seems to bear out that SO correctly identified the source of the "problem", and SO is such a valuable resource that I'd rather it be Google-visible over passing pagerank.
This is the problem when you have a single player dominating the search space with an opaque algo and a very limited appeals process.
I think "dealt with" is a strong way to phrase it.
The site still applies rel=nofollow to what seems like the majority of outgoing links. Since I would call the overwhelming majority of content on Stackoverflow not spam, there's a lot of good links getting nothing.
I guess I'm in the minority, but I'm not really buying that this is abusive.
Yes, it would be nice if SO removed the nofollow from known-good posts (which, indeed, it seems they are now doing), but adding nofollow is a pretty simple and reasonable way to make the site a whole lot less attractive to some of the most obnoxious sort of spammers. I would hope nobody is adding links to their SO answers with the expectation of an SEO benefit.
If there's a link in a stackoverflow answer of mine, it's there because there's really good information at the end of the link. That's the sort of information Google & Bing use to make their searches useful, it's the sort of information I want Google et al to know.
I thought that the entire point of nofollow was to apply it to links that come from user content so that spamming your site becomes less useful. Before this article, I never even heard of the idea that you're supposed to apply nofollow selectively based on your own judgment of how spammy a post was. I always thought it was a simple dichotomy: links created by users get nofollow, and links you create yourself don't.
Yes, I would be happy with the traffic from SO though the SEO benefit from a "follow" link may not hurt. I have seen the tricks that spammers do to get follow links and am sure SO will be spammed crazy if it allowed them.
Sites are free to overuse 'rel=nofollow', and search engines are free to selectively ignore it (for either link-discovery or ranking purposes).
The strongest point I see made by this author is about SO's hypocrisy: SO requires attribution to be with a link, and specify that link must not have 'rel=nofollow'. Yet their content relies heavily on references to elsewhere which are all 'rel=nofollow'ed.
A sense of fair play in attribution, and spirit of mutual assistance between reliable authorities, would suggest allowing at least some well-vetted outlinks to be unencumbered.
If you republish this content, we require that you: [...]
Hyperlink directly to the original question on the source site [...]
By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a tinyurl or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed.
This is about the spirit of fair attribution. Attribution to the website, and more importantly, to the individuals who so generously contributed their time to create that content in the first place!
Anyway, I hope that clears up any confusion — feel free to remix and reuse to your heart’s content, as long as a good faith effort is made to attribute the content!
If the link to stackexchange was user-generated content I seriously doubt they would actually expect it to have nofollow.
The outgoing links to any site from the stackexchange blog do not have nofollow. Any content that they are explicitly curating does not have nofollow, any user content does. I think that is a reasonably consistent policy.
The author doesn't address this response by Jeff Atwood:
URGENT UPDATE
We were seeing a significant drop in Google (organic) traffic for Server Fault after instituting this "follow links if enough upvotes post-edit or post-create" policy.
We traced it back to what we currently think are a string of posts on Server Fault that got nofollow removed through "trust", but were being interpreted by Google as link farms or spammy pages.
When the sites that informed various well regarded answers on the sites get their due, their search engine rank rises. With such a well ranked site now pointing towards them, they rise in the results, often higher than the SO answer referencing it.
As a secondary issue, there's a relatively common black-hat-seo practice of buying a well regarded domain, and soaking that links for all they're worth to promote something. Their switch-flipping may have run afoul of systems designed to mitigate that.
Although I get the concern here, it would seem that if you removed the nofollows, it would open up the site to all kinds of abuse and diminish the value of the content very quickly. It doesn't strike me as a very difficult task to create a few thousand profiles and have them vote for each other (and also give you a ton of votes to random people to hide better), and then use the collected power for SEO spam.
I agree with you that there has to be a better solution, but it doesn't strike me as a very trivial one... any thoughts on how one would approach this?
>It doesn't strike me as a very difficult task to create a few thousand profiles and have them vote for each other (and also give you a ton of votes to random people to hide better), and then use the collected power for SEO spam.
I have to disagree. I bet this kind of abuse is well studied by any major site relying on a reputation system/user content. I can't point to anything specific, but off the top of my head discovering that kind of abuse seems exactly like finding strongly connected components: http://en.wikipedia.org/wiki/Strongly_connected_component
So you could probably prevent this algorithmically, but even if you couldn't... the site is heavily moderated. Bad questions and answers are down voted or closed. In my experience, getting lots of reputation on SO actually requires a lot of persistent effort. I have just over 1000 rep myself and it felt like it took forever to get there. Even if you could get lots of upvotes, you'd still be looking at daily reputation limits. And finally, you'd waste all that on one or two spam links only to get perma-banned and have all your links deleted?
A combination of high reputation + an initial nofollow time period (to allow spam links to be discovered) seems like it would be pretty effective.
There's really only one route to taking advantage of the google juice: http://xkcd.com/810/
"it would seem that if you removed the nofollows, it would open up the site to all kinds of abuse"
That's why the author of the article suggested that they remove "nofollow" for users who have an established reputation, like SlashDot does. That algorithm would seem to be very easy to implement.
As the article mentions; at a certain reputation level StackExchange users are granted a followed link in their profile. Similar metrics could be applied to questions and answers with the "nofollow" being removed from highly rated user content which is unlikely to be spam.
Still a hard problem but author reputation and question/answer votes give a good set of data from which we might identify links to sources which deserve attribution.
The article seems to argue that because there is no perfect solution to spam (i.e. spammers will still want their links on SO even if they are nofollowed), then there is no point in marginally reducing the incentive to spam.
I'm confident that stackoverflow already utilizes voting ring detection logic. High karma on SO is quite valuable, I'm sure there are many that have already tried to game the system, people who are probably a lot better at evading the voting ring detection than the average spammer...
SO gets lots of googlejuice, and because it's gameable, they appear to want to give some of that googlejuice only to "reputable" links (which they do for links in high-rep users' profiles). That's a really hard thing to judge, but the fact that they're thinking about it, talking about it, and soliciting feedback on it means that they're trying to do the right thing, even if the article's author doesn't think their "right thing" is good enough for him.
I can understand why they do it - diminishing the page rank of sites that give answers to questions on Stack Overflow causes Stack Overflow to gain more traffic - but it still seems kind of underhanded. They rely on others for the content of the site, so they should give credit to the producers of that content - without it, they would have no service at all.
This nofollow stuff has always seemed very weird to me for several reasons.
- Search Engines are supposed to be analyzing the web, figuring out whats important. They have the incentive to do this well. The websites they are analyzing do not. Websites have an incentive to nofollow everything. They might be worried about their own rankings, but why should they care about the sites they link to. Nofollow is safe and harmless. No-nofolow should get you in trouble. what does a site gain from not using nofollow links on everything?
- To enforce nofollow rules, Google are supposedly disciplining sites by hurting their organic rankings. But a page with nofollow links appears exactly the same to a user. It's just as right an answer as it was before. Is Google lowering the quality of its results to police the web?
- If Google are able to detect "link farms" and user generated content that should have been nofollow-ed, why don't they just treat those links as nofollow and ignore them. Use this detection to analyze the web, not police it.
- Are they ignoring important information sources? A huge, information rich portion of the web is user generated. Eg wikipedia & stackoverflow. How can Google really be ignoring these links here as data sources? The links on a wikipedia page for example, are very informative. If some webpage is mentioned frequently in stackoverflow questions and answers, it's probably important and its probably a good answer to a lot of questions people are asking google.
I really don't see SO as being the problem here. I happen to think that secret SEO/page rank algorithms are the culprit in many ways. Google is in the unique position as to be able to write a "how to behave" guide for the internet. Play by their rules and you are golden, start to violate them and you start to be "punished" in a gradual (but exponentially more severe). The whole time full feedback and data on what you are doing wrong is provided. As this approach evolves spammers would be less and less successful and might just have to resort to legitimate means of marketing.
Furthermore, the search engines could nip this kind of abuse in the bud, by ignoring the "nofollow" attribute of the links on a site if it determines that site is abusing it (e.g. if all or some high percentage of the links on the site have "nofollow").
They're technically using it for it's purpose, or they can claim they are even if they have a different motive. Stackoverflow allows links in comments and posts and this could be abused.
Sites with high pagerank that allow user submitted links are few and far between. If they didn't put nofollow (at least for new accounts), SO is popular enough that it would have to deal with targeted spam attacks aimed specifically at them. That is a sucky problem to deal with.
Esprit d'escalier: "In my day (usenet), we didn't have links--and we liked it." (old SNL joke for those who don't get it).
This conversation has 2 angles I find very interesting. The first is the whole question of quality of the content. Google is a huge AI trying to give you the same quality of information that you'd select if you looked at the same (millions of) sources. At first, I liked netcan's answer[1], which says essentially, nofollow them all or none, Google will sort out its own. But the more I think about it, the more context that sites can provide automatically, the better. If a site can say "this content provided by untrusted source" (because we're a community-driven site, and we can't police everything), that's a help to Google's Algorithm. Google is free to still follow the link and perhaps, using it's other knowledge of the target, assign a trustworthiness score to the nofollow quality on the original site. If a good community is providing good links, but the website owner still marks them nofollow, that might become a badge of honor in Google eyes (all-crawling robot tentacles?).
As a side note, I love how everybody, from slimy SEO to internet ethicist, is trying to guess Google's secret formula (it used to be Coca-Cola that had a secret formula that was releveant). Google is essentially the unknowable proto-God of this new information universe.
The other interesting topic is getting paid for content. My first reaction to the OP was "wah, wah, OP doesn't want to pay for his little corner of the internet." Let's face it, the internet doesn't run for free (hardware and electricity), and the content that attracts people doesn't write itself. StackOverflow provides a community for people to deliver their content into and receive socaial rewards. If SO can't do that because of some quirk of SEO, then it's likely it wouldn't be able to host the community anymore. The internet has still not figured out a way to pay for that other than advertizing (the forcing of unwanted content on readers). But advertizing only works if you can keep track of views and stay on top of the SEO game. I totally agree they're hypocrites for hoarding the follows (must follow to us be we nofollow to everyone), but how's that different from a corporate board's fiduciary duty to shareholders to maximize the profit of the company? I also think cheald[2] makes a very good point about why the dynamics must be towards nofollow. The question is then how involved or committed are you to the community, and how do you feel rewarded for the time you contribute to it.
[+] [-] kmontrose|13 years ago|reply
Would have been dealt with even earlier, but our experiments with it (http://meta.stackoverflow.com/a/69032/130213) lead us to believe Google was starting to treat us as a link farm.
What we were seeing in our earlier attempt wasn't pages appearing lower in results (which could reasonably be expected in some cases as a result of the change; and we wouldn't care, 2nd on a search under an equally good resource is fine by us), but pages being removed entirely; almost universally pages that had newly un-nofollow-ed links to stores, or ad-laden sites (mostly legitimate posts, but some spam that stuck around for a bit before the community deleted it). So, "classified as link farm" seemed like the most likely problem. Google naturally won't tell you why your ranking drops, so it was (and remains) just an educated guess as to what happened.
So we stopped, putting it on the "wish we could, but reality doesn't let us"-list until Google reached out to us (among thousands of others, I'm sure) to change our nofollow practices. Google didn't describe any changes in their algorithm, but it seems reasonable to me that there would have been some tweaks around nofollow to accompany a new policy; again just an educated guess.
Basically, this is an old post complaining about a long since addressed concern that we had tried to address even earlier but ran into practical problems with.
While the exact details of our algorithm are secret by necessity, I will say that we've had to consider posts individually to prevent spammers from kiting a single account up to post nofollow-less spam content. People still try, it's kind of astounding how much spammers try (I suspect SEO's opaqueness cuts both ways here), but it doesn't work (well, you can get one link in your profile; but there's less SEO juice to pass and you are hard-capped at one, no matter how long it takes someone to delete your account).
Disclaimer: Stack Exchange employee, I was on all the relevant calls but has been a couple years so grain of salt and all that.
[+] [-] bbondy|13 years ago|reply
I'm not sure why this made HN 2 years after posting it, but I wanted to make a comment.
SO has implemented a way to remove the nofollow links, but it is way too strict, and probably only affects a very very very small percentage of answers. I'd bet less than 0.1%.
For example see this answer with 74 upvotes from a user with almost 100k reputation. The links are to MSDN (which is probably not spam by definition) and to a quoted source on techbubbles.com. http://stackoverflow.com/questions/2660355/net-4-0-has-a-new...
[+] [-] codinghorror|13 years ago|reply
When Google is 90% of your traffic, you REALLY REALLY REALLY do not want to get on their bad side, even accidentally, so I can hope you understand why we wanted to be cautious to the extreme here. If Google decides you're not doing things right, it is literally a business ending move.
Unless your VCs and investors are cool with you losing, y'know, ninety percent of your traffic.
[+] [-] nikatwork|13 years ago|reply
History seems to bear out that SO correctly identified the source of the "problem", and SO is such a valuable resource that I'd rather it be Google-visible over passing pagerank.
This is the problem when you have a single player dominating the search space with an opaque algo and a very limited appeals process.
[+] [-] preinheimer|13 years ago|reply
The site still applies rel=nofollow to what seems like the majority of outgoing links. Since I would call the overwhelming majority of content on Stackoverflow not spam, there's a lot of good links getting nothing.
[+] [-] eli|13 years ago|reply
Yes, it would be nice if SO removed the nofollow from known-good posts (which, indeed, it seems they are now doing), but adding nofollow is a pretty simple and reasonable way to make the site a whole lot less attractive to some of the most obnoxious sort of spammers. I would hope nobody is adding links to their SO answers with the expectation of an SEO benefit.
[+] [-] bryanlarsen|13 years ago|reply
[+] [-] mikeash|13 years ago|reply
[+] [-] manojlds|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] the_gipsy|13 years ago|reply
[+] [-] gojomo|13 years ago|reply
The strongest point I see made by this author is about SO's hypocrisy: SO requires attribution to be with a link, and specify that link must not have 'rel=nofollow'. Yet their content relies heavily on references to elsewhere which are all 'rel=nofollow'ed.
A sense of fair play in attribution, and spirit of mutual assistance between reliable authorities, would suggest allowing at least some well-vetted outlinks to be unencumbered.
[+] [-] nerfhammer|13 years ago|reply
If you republish this content, we require that you: [...]
Hyperlink directly to the original question on the source site [...]
By “directly”, I mean each hyperlink must point directly to our domain in standard HTML visible even with JavaScript disabled, and not use a tinyurl or any other form of obfuscation or redirection. Furthermore, the links must not be nofollowed.
This is about the spirit of fair attribution. Attribution to the website, and more importantly, to the individuals who so generously contributed their time to create that content in the first place!
Anyway, I hope that clears up any confusion — feel free to remix and reuse to your heart’s content, as long as a good faith effort is made to attribute the content!
http://blog.stackoverflow.com/2009/06/attribution-required/
[+] [-] esrauch|13 years ago|reply
The outgoing links to any site from the stackexchange blog do not have nofollow. Any content that they are explicitly curating does not have nofollow, any user content does. I think that is a reasonably consistent policy.
[+] [-] VMG|13 years ago|reply
URGENT UPDATE We were seeing a significant drop in Google (organic) traffic for Server Fault after instituting this "follow links if enough upvotes post-edit or post-create" policy.
We traced it back to what we currently think are a string of posts on Server Fault that got nofollow removed through "trust", but were being interpreted by Google as link farms or spammy pages.
[....]
http://meta.stackoverflow.com/a/51156
[+] [-] bryanlarsen|13 years ago|reply
So it appears that they do remove rel=nofollow from reputable links, although their threshold for reputable appears to be very high.
[+] [-] preinheimer|13 years ago|reply
When the sites that informed various well regarded answers on the sites get their due, their search engine rank rises. With such a well ranked site now pointing towards them, they rise in the results, often higher than the SO answer referencing it.
As a secondary issue, there's a relatively common black-hat-seo practice of buying a well regarded domain, and soaking that links for all they're worth to promote something. Their switch-flipping may have run afoul of systems designed to mitigate that.
[+] [-] kamjam|13 years ago|reply
So not sure how this made it to the front page of HN. Not surprised the author doesn't address the response.
[+] [-] sologoub|13 years ago|reply
I agree with you that there has to be a better solution, but it doesn't strike me as a very trivial one... any thoughts on how one would approach this?
[+] [-] jere|13 years ago|reply
I have to disagree. I bet this kind of abuse is well studied by any major site relying on a reputation system/user content. I can't point to anything specific, but off the top of my head discovering that kind of abuse seems exactly like finding strongly connected components: http://en.wikipedia.org/wiki/Strongly_connected_component
So you could probably prevent this algorithmically, but even if you couldn't... the site is heavily moderated. Bad questions and answers are down voted or closed. In my experience, getting lots of reputation on SO actually requires a lot of persistent effort. I have just over 1000 rep myself and it felt like it took forever to get there. Even if you could get lots of upvotes, you'd still be looking at daily reputation limits. And finally, you'd waste all that on one or two spam links only to get perma-banned and have all your links deleted?
A combination of high reputation + an initial nofollow time period (to allow spam links to be discovered) seems like it would be pretty effective.
There's really only one route to taking advantage of the google juice: http://xkcd.com/810/
[+] [-] greenyoda|13 years ago|reply
That's why the author of the article suggested that they remove "nofollow" for users who have an established reputation, like SlashDot does. That algorithm would seem to be very easy to implement.
[+] [-] zalambar|13 years ago|reply
Still a hard problem but author reputation and question/answer votes give a good set of data from which we might identify links to sources which deserve attribution.
[+] [-] streptomycin|13 years ago|reply
[+] [-] bryanlarsen|13 years ago|reply
[+] [-] jeremysmyth|13 years ago|reply
SO gets lots of googlejuice, and because it's gameable, they appear to want to give some of that googlejuice only to "reputable" links (which they do for links in high-rep users' profiles). That's a really hard thing to judge, but the fact that they're thinking about it, talking about it, and soliciting feedback on it means that they're trying to do the right thing, even if the article's author doesn't think their "right thing" is good enough for him.
[+] [-] HPBEggo|13 years ago|reply
[+] [-] netcan|13 years ago|reply
- Search Engines are supposed to be analyzing the web, figuring out whats important. They have the incentive to do this well. The websites they are analyzing do not. Websites have an incentive to nofollow everything. They might be worried about their own rankings, but why should they care about the sites they link to. Nofollow is safe and harmless. No-nofolow should get you in trouble. what does a site gain from not using nofollow links on everything?
- To enforce nofollow rules, Google are supposedly disciplining sites by hurting their organic rankings. But a page with nofollow links appears exactly the same to a user. It's just as right an answer as it was before. Is Google lowering the quality of its results to police the web?
- If Google are able to detect "link farms" and user generated content that should have been nofollow-ed, why don't they just treat those links as nofollow and ignore them. Use this detection to analyze the web, not police it.
- Are they ignoring important information sources? A huge, information rich portion of the web is user generated. Eg wikipedia & stackoverflow. How can Google really be ignoring these links here as data sources? The links on a wikipedia page for example, are very informative. If some webpage is mentioned frequently in stackoverflow questions and answers, it's probably important and its probably a good answer to a lot of questions people are asking google.
[+] [-] bhanks|13 years ago|reply
1. This post is from 2011 2. Google ended the ability to link sculpt a long time ago.
I would be interested in studies showing cases where link sculpting still worked.
[+] [-] kamjam|13 years ago|reply
[+] [-] robomartin|13 years ago|reply
[+] [-] JangoSteve|13 years ago|reply
[+] [-] SenorWilson|13 years ago|reply
[+] [-] eli|13 years ago|reply
[+] [-] walshemj|13 years ago|reply
[+] [-] theseanstewart|13 years ago|reply
[+] [-] patmcguire|13 years ago|reply
[+] [-] debacle|13 years ago|reply
The only possible reason I could see is spam avoidance.
[+] [-] 205guy|13 years ago|reply
This conversation has 2 angles I find very interesting. The first is the whole question of quality of the content. Google is a huge AI trying to give you the same quality of information that you'd select if you looked at the same (millions of) sources. At first, I liked netcan's answer[1], which says essentially, nofollow them all or none, Google will sort out its own. But the more I think about it, the more context that sites can provide automatically, the better. If a site can say "this content provided by untrusted source" (because we're a community-driven site, and we can't police everything), that's a help to Google's Algorithm. Google is free to still follow the link and perhaps, using it's other knowledge of the target, assign a trustworthiness score to the nofollow quality on the original site. If a good community is providing good links, but the website owner still marks them nofollow, that might become a badge of honor in Google eyes (all-crawling robot tentacles?).
As a side note, I love how everybody, from slimy SEO to internet ethicist, is trying to guess Google's secret formula (it used to be Coca-Cola that had a secret formula that was releveant). Google is essentially the unknowable proto-God of this new information universe.
The other interesting topic is getting paid for content. My first reaction to the OP was "wah, wah, OP doesn't want to pay for his little corner of the internet." Let's face it, the internet doesn't run for free (hardware and electricity), and the content that attracts people doesn't write itself. StackOverflow provides a community for people to deliver their content into and receive socaial rewards. If SO can't do that because of some quirk of SEO, then it's likely it wouldn't be able to host the community anymore. The internet has still not figured out a way to pay for that other than advertizing (the forcing of unwanted content on readers). But advertizing only works if you can keep track of views and stay on top of the SEO game. I totally agree they're hypocrites for hoarding the follows (must follow to us be we nofollow to everyone), but how's that different from a corporate board's fiduciary duty to shareholders to maximize the profit of the company? I also think cheald[2] makes a very good point about why the dynamics must be towards nofollow. The question is then how involved or committed are you to the community, and how do you feel rewarded for the time you contribute to it.
[1] http://news.ycombinator.com/item?id=4777260 [2] http://news.ycombinator.com/item?id=4774888
[+] [-] cstrat|13 years ago|reply
[+] [-] cstrat|13 years ago|reply
http://webcache.googleusercontent.com/search?q=cache:3sKnQOn...
[+] [-] pootch|13 years ago|reply
[deleted]
[+] [-] doyouevenlift|13 years ago|reply
[deleted]