It's interesting that PageRank's measure of quality is entirely dependent on there being a community that recognizes the quality of the content first, before the search engine. Without a community, you're not going to get incoming links.
In other words, work produced by lonely geniuses is quite likely to go unnoticed.
For all we know, the content that is being produced by companies like Demand Media has already been produced by thoughtful people, writing at length about subjects they love on obscure websites that no one ever links to. What a shame that would be!
I've actually seen that happen to one of the lead engineers in search quality at Google. He'd written a great guide to ultralight backpacking that until I linked to it, wasn't indexed by any major search engines.
In other words, work produced by lonely geniuses is quite likely to go unnoticed.
It’s not quite as depressing as that. I recently made a quaint little site for a band and it has exactly zero other sites linking to it. It’s the first result when you search for the name of the band (which is town-name+generic-term-used-in-bandnames).
This only works with stuff that’s rare on the web, though. If there were other bands with the same name and if someone linked to them my little website would probably get swamped. (The same would presumably happen if someone were to write a blog post about the band – say, a scathing review of their last gig – and if that one post gets only a handful of links. Hm, so getting a few links seems at least like a good defense in such cases. Luckily many of the band’s target demographic aren’t actually all that internet savvy :)
You imply that works of geniuses should be noticed, but geniuses are so esoteric, rare, and difficult to understand that most wouldn't notice. Since the majority of people don't care about what geniuses care about, it's unlikely they'll appreciate it enough to link to it. If they do link to it, then it's "they" are probably a very small population of people, maybe a handful of other geniuses themselves.
The google page rank algorithm is designed in such a way that the work of geniuses should go unnoticed. Pagerank is designed for the masses. For the masses of consumers specifically.
Google is not designed for the geniuses. It's designed for people who want what everyone else wants.
In the beginning, when google was a tool used primarily by geniuses, then geniuses were the community. They were the masses that used google. Their algorithms now pick selections from a new community. Bloggers who can copy/paste. Bloggers with lots of friends who will link to their posts because the friends are asked to and because other friends reciprocate.
Google doesn't know if you are linking to a web page because you like the web page or because someone who built the web page asked you to link to it or because you are getting paid.
The content produced by Demand Media is still spam, all the more effective as spam to the extent that it approximates thoughtful but obscure content.
The problem is that "indistinguishable" does not mean "identical". The Optimization-by-Proxy concept also applies to the way we recognize useful content and distinguish it from spam: if spam-creators exploit the gap between our perception of content and the actual quality of the content, they will ultimately create spam that fools even savvy users, and we will be influenced by it without even realizing it.
One of the characters in Neal Stephenson's "Anathem" described this phenomenon, occurring on his world's equivalent of the internet: sophisticated AI had led to spam (or "crap" as he called it) which was created by taking perfectly valid, reasonable ideas, combining them with falsehoods or biased information expressed clearly and reasonably, and releasing it in the form of real, substantive communications between users. A great deal of time and energy had to go into sorting "crap" from valid information.
> In other words, work produced by lonely geniuses is quite likely to go unnoticed.
I think this is something that has happened throughout history. The web probably makes it easier for the their work to be uncover than before but they are still at a disadvantage.
Hi, I am the author of that. Would you say the depiction in the article is more-or-less accurate? I am asking as I wrote this purely from an outside/theoretical perspective.
My life is going from using a Google that used to give me useful results to one where "tar up website" returns the top result:
"Deep-sea ice crystals stymie Gulf oil leak fix - Yahoo! News
8 May 2010 ... thick blobs of tar began washing up on Alabama's white sand beaches. ... platform at the Deep Sea Horizon oil spill site in the Gulf"
At least a result from 4 days ago is an improvement on when I'd get usenet or mailing list results from 1999-2004 whenever I searched for anything linuxy.
Fascinating essay, but I'm not quite sure whether it's a problem that sufficiently advanced spam is indistinguishable from content.
After all, Demand Media does produce real, editorially vetted content from real human writers. The payment system encourages what I'll call extreme efficiency of research and writing, but that simply optimizes it for the handy-reference domain of search results (e.g. How to fillet a smallmouth bass), which may not be "high quality" as such but does provide direct, clearly written and reasonably valid responses to the search queries that elicit them.
I've seen a lot of pages where I couldn't tell if it was written by a markov-model or a human. Many of the people who get paid for $1 content don't speak English natively.
I'd put a finer point on it: paid writing encourages the creation of content which appears superficially relevant (especially through the eyes of a search engine), but doesn't actually convey any substantial information.
I'd suggest that it is a problem. It's something that Harry G. Frankfurt examined in his essay "On Bullshit" (http://en.wikipedia.org/wiki/On_bullshit and http://press.princeton.edu/titles/7929.html). I listened to an audio version of it and it was quite fascinating. As the Wikipedia article suggests, Frankfurt posits that bullshit is more corrosive than lies because bullshit bears no relation whatsoever to the truth.
This is exactly what makes Fox News, as an example, so dangerous. They don't care about the truth when they report; they only care about getting more eyeballs. I suspect that ANY spam that humans have to deal with to determine if it's useful is much the same.
Moultano - I have a strange request, but one I hope you'll take seriously.
I think this issue is very important - to Google, to web searchers, to businesses seeking to be found by Google and even to less scrupulous web operators. I'd love the opportunity to engage in 20-30 minute written chat with you and publish it (anywhere on the web you'd like).
As background, I've worked for years as an SEO consultant, founded a community and company in the space (SEOmoz.org), and have been spending the last few years developing and launching search marketing software.
I certainly respect your background and beliefs, but I think there's some flawed logic in your assumptions and arguments that I'd love to dig into, talk about and maybe even have some of my own perceptions changed. I would not ask you to disclose anything that's confidential - I'm much more interested in the theory and logic behind web spam, SEO and search relevancy.
You can reach me via email - [email protected]. Would love to hear from you!
Haven't read it all, but I am just wondering: by now data dumps of people's connections are probably making the rounds in the dark channels? I think sending spam that appears to be from your friends could be a big "improvement", and should be child's play with the data that is already freely available.
Maybe that could become one of the first privacy disasters, when people realize they made their email unusable by publishing their connections.
If we presume that any algorithmic, procedural, or structural system built by one party can be reverse-engineered and understood by another party, the concept of Optimization by Proxy, and the more general Goodhart's law, form a pretty compelling argument against designing optimized systems as solutions to problems in general.
Maybe in some cases keeping a system convoluted and inconsistent can actually help ensure stability and durability?
absolutely....sometimes i mark as "spam" conversations that i'm personally not interested in, even if the author is "legitimately" spamming me. (eg a mis-guided friend's mass email...or more likely the dozens of mis-guided reply-all's)
I think this is a valid use case of spam filters. I have trained more than one to detect my father's powerpoint emails and bad chain-mail jokes and separate them from his personal messages that I actually want to read.
[+] [-] adriand|16 years ago|reply
In other words, work produced by lonely geniuses is quite likely to go unnoticed.
For all we know, the content that is being produced by companies like Demand Media has already been produced by thoughtful people, writing at length about subjects they love on obscure websites that no one ever links to. What a shame that would be!
[+] [-] moultano|16 years ago|reply
http://eric-and-april.com/Ultralight/index.html
[+] [-] ugh|16 years ago|reply
It’s not quite as depressing as that. I recently made a quaint little site for a band and it has exactly zero other sites linking to it. It’s the first result when you search for the name of the band (which is town-name+generic-term-used-in-bandnames).
This only works with stuff that’s rare on the web, though. If there were other bands with the same name and if someone linked to them my little website would probably get swamped. (The same would presumably happen if someone were to write a blog post about the band – say, a scathing review of their last gig – and if that one post gets only a handful of links. Hm, so getting a few links seems at least like a good defense in such cases. Luckily many of the band’s target demographic aren’t actually all that internet savvy :)
[+] [-] fnid2|16 years ago|reply
The google page rank algorithm is designed in such a way that the work of geniuses should go unnoticed. Pagerank is designed for the masses. For the masses of consumers specifically.
Google is not designed for the geniuses. It's designed for people who want what everyone else wants.
In the beginning, when google was a tool used primarily by geniuses, then geniuses were the community. They were the masses that used google. Their algorithms now pick selections from a new community. Bloggers who can copy/paste. Bloggers with lots of friends who will link to their posts because the friends are asked to and because other friends reciprocate.
Google doesn't know if you are linking to a web page because you like the web page or because someone who built the web page asked you to link to it or because you are getting paid.
And google doesn't care.
[+] [-] Gormo|16 years ago|reply
The problem is that "indistinguishable" does not mean "identical". The Optimization-by-Proxy concept also applies to the way we recognize useful content and distinguish it from spam: if spam-creators exploit the gap between our perception of content and the actual quality of the content, they will ultimately create spam that fools even savvy users, and we will be influenced by it without even realizing it.
One of the characters in Neal Stephenson's "Anathem" described this phenomenon, occurring on his world's equivalent of the internet: sophisticated AI had led to spam (or "crap" as he called it) which was created by taking perfectly valid, reasonable ideas, combining them with falsehoods or biased information expressed clearly and reasonably, and releasing it in the form of real, substantive communications between users. A great deal of time and energy had to go into sorting "crap" from valid information.
[+] [-] dejb|16 years ago|reply
I think this is something that has happened throughout history. The web probably makes it easier for the their work to be uncover than before but they are still at a disadvantage.
[+] [-] moultano|16 years ago|reply
[+] [-] alexandros|16 years ago|reply
[+] [-] jodrellblank|16 years ago|reply
"Deep-sea ice crystals stymie Gulf oil leak fix - Yahoo! News 8 May 2010 ... thick blobs of tar began washing up on Alabama's white sand beaches. ... platform at the Deep Sea Horizon oil spill site in the Gulf"
At least a result from 4 days ago is an improvement on when I'd get usenet or mailing list results from 1999-2004 whenever I searched for anything linuxy.
:/
[+] [-] MikeCapone|16 years ago|reply
[+] [-] RyanMcGreal|16 years ago|reply
After all, Demand Media does produce real, editorially vetted content from real human writers. The payment system encourages what I'll call extreme efficiency of research and writing, but that simply optimizes it for the handy-reference domain of search results (e.g. How to fillet a smallmouth bass), which may not be "high quality" as such but does provide direct, clearly written and reasonably valid responses to the search queries that elicit them.
[+] [-] moultano|16 years ago|reply
[+] [-] duskwuff|16 years ago|reply
[+] [-] halostatue|16 years ago|reply
This is exactly what makes Fox News, as an example, so dangerous. They don't care about the truth when they report; they only care about getting more eyeballs. I suspect that ANY spam that humans have to deal with to determine if it's useful is much the same.
[+] [-] byrneseyeview|16 years ago|reply
[+] [-] pook|16 years ago|reply
http://hamstermotor.motime.com/post/683104/the-future-of-spa...
[+] [-] alexandros|16 years ago|reply
[+] [-] randfish|16 years ago|reply
I think this issue is very important - to Google, to web searchers, to businesses seeking to be found by Google and even to less scrupulous web operators. I'd love the opportunity to engage in 20-30 minute written chat with you and publish it (anywhere on the web you'd like).
As background, I've worked for years as an SEO consultant, founded a community and company in the space (SEOmoz.org), and have been spending the last few years developing and launching search marketing software.
I certainly respect your background and beliefs, but I think there's some flawed logic in your assumptions and arguments that I'd love to dig into, talk about and maybe even have some of my own perceptions changed. I would not ask you to disclose anything that's confidential - I'm much more interested in the theory and logic behind web spam, SEO and search relevancy.
You can reach me via email - [email protected]. Would love to hear from you!
[+] [-] moultano|16 years ago|reply
[+] [-] Tichy|16 years ago|reply
Maybe that could become one of the first privacy disasters, when people realize they made their email unusable by publishing their connections.
[+] [-] Gormo|16 years ago|reply
Maybe in some cases keeping a system convoluted and inconsistent can actually help ensure stability and durability?
[+] [-] samg|16 years ago|reply
[+] [-] BoppreH|16 years ago|reply
And sufficiently advanced errors are indistinguishable from pages made for pure irony.
[+] [-] moultano|16 years ago|reply
[+] [-] diN0bot|16 years ago|reply
[+] [-] alextp|16 years ago|reply
[+] [-] Tichy|16 years ago|reply