top | item 1339704

"Sufficiently advanced spam is indistinguishable from content"

84 points| moultano | 16 years ago |lesswrong.com | reply

38 comments

order
[+] adriand|16 years ago|reply
It's interesting that PageRank's measure of quality is entirely dependent on there being a community that recognizes the quality of the content first, before the search engine. Without a community, you're not going to get incoming links.

In other words, work produced by lonely geniuses is quite likely to go unnoticed.

For all we know, the content that is being produced by companies like Demand Media has already been produced by thoughtful people, writing at length about subjects they love on obscure websites that no one ever links to. What a shame that would be!

[+] moultano|16 years ago|reply
I've actually seen that happen to one of the lead engineers in search quality at Google. He'd written a great guide to ultralight backpacking that until I linked to it, wasn't indexed by any major search engines.

http://eric-and-april.com/Ultralight/index.html

[+] ugh|16 years ago|reply
In other words, work produced by lonely geniuses is quite likely to go unnoticed.

It’s not quite as depressing as that. I recently made a quaint little site for a band and it has exactly zero other sites linking to it. It’s the first result when you search for the name of the band (which is town-name+generic-term-used-in-bandnames).

This only works with stuff that’s rare on the web, though. If there were other bands with the same name and if someone linked to them my little website would probably get swamped. (The same would presumably happen if someone were to write a blog post about the band – say, a scathing review of their last gig – and if that one post gets only a handful of links. Hm, so getting a few links seems at least like a good defense in such cases. Luckily many of the band’s target demographic aren’t actually all that internet savvy :)

[+] fnid2|16 years ago|reply
You imply that works of geniuses should be noticed, but geniuses are so esoteric, rare, and difficult to understand that most wouldn't notice. Since the majority of people don't care about what geniuses care about, it's unlikely they'll appreciate it enough to link to it. If they do link to it, then it's "they" are probably a very small population of people, maybe a handful of other geniuses themselves.

The google page rank algorithm is designed in such a way that the work of geniuses should go unnoticed. Pagerank is designed for the masses. For the masses of consumers specifically.

Google is not designed for the geniuses. It's designed for people who want what everyone else wants.

In the beginning, when google was a tool used primarily by geniuses, then geniuses were the community. They were the masses that used google. Their algorithms now pick selections from a new community. Bloggers who can copy/paste. Bloggers with lots of friends who will link to their posts because the friends are asked to and because other friends reciprocate.

Google doesn't know if you are linking to a web page because you like the web page or because someone who built the web page asked you to link to it or because you are getting paid.

And google doesn't care.

[+] Gormo|16 years ago|reply
The content produced by Demand Media is still spam, all the more effective as spam to the extent that it approximates thoughtful but obscure content.

The problem is that "indistinguishable" does not mean "identical". The Optimization-by-Proxy concept also applies to the way we recognize useful content and distinguish it from spam: if spam-creators exploit the gap between our perception of content and the actual quality of the content, they will ultimately create spam that fools even savvy users, and we will be influenced by it without even realizing it.

One of the characters in Neal Stephenson's "Anathem" described this phenomenon, occurring on his world's equivalent of the internet: sophisticated AI had led to spam (or "crap" as he called it) which was created by taking perfectly valid, reasonable ideas, combining them with falsehoods or biased information expressed clearly and reasonably, and releasing it in the form of real, substantive communications between users. A great deal of time and energy had to go into sorting "crap" from valid information.

[+] dejb|16 years ago|reply
> In other words, work produced by lonely geniuses is quite likely to go unnoticed.

I think this is something that has happened throughout history. The web probably makes it easier for the their work to be uncover than before but they are still at a disadvantage.

[+] moultano|16 years ago|reply
I work on search-quality at Google. This is my life.
[+] alexandros|16 years ago|reply
Hi, I am the author of that. Would you say the depiction in the article is more-or-less accurate? I am asking as I wrote this purely from an outside/theoretical perspective.
[+] jodrellblank|16 years ago|reply
My life is going from using a Google that used to give me useful results to one where "tar up website" returns the top result:

"Deep-sea ice crystals stymie Gulf oil leak fix - Yahoo! News 8 May 2010 ... thick blobs of tar began washing up on Alabama's white sand beaches. ... platform at the Deep Sea Horizon oil spill site in the Gulf"

At least a result from 4 days ago is an improvement on when I'd get usenet or mailing list results from 1999-2004 whenever I searched for anything linuxy.

:/

[+] RyanMcGreal|16 years ago|reply
Fascinating essay, but I'm not quite sure whether it's a problem that sufficiently advanced spam is indistinguishable from content.

After all, Demand Media does produce real, editorially vetted content from real human writers. The payment system encourages what I'll call extreme efficiency of research and writing, but that simply optimizes it for the handy-reference domain of search results (e.g. How to fillet a smallmouth bass), which may not be "high quality" as such but does provide direct, clearly written and reasonably valid responses to the search queries that elicit them.

[+] moultano|16 years ago|reply
I've seen a lot of pages where I couldn't tell if it was written by a markov-model or a human. Many of the people who get paid for $1 content don't speak English natively.
[+] duskwuff|16 years ago|reply
I'd put a finer point on it: paid writing encourages the creation of content which appears superficially relevant (especially through the eyes of a search engine), but doesn't actually convey any substantial information.
[+] halostatue|16 years ago|reply
I'd suggest that it is a problem. It's something that Harry G. Frankfurt examined in his essay "On Bullshit" (http://en.wikipedia.org/wiki/On_bullshit and http://press.princeton.edu/titles/7929.html). I listened to an audio version of it and it was quite fascinating. As the Wikipedia article suggests, Frankfurt posits that bullshit is more corrosive than lies because bullshit bears no relation whatsoever to the truth.

This is exactly what makes Fox News, as an example, so dangerous. They don't care about the truth when they report; they only care about getting more eyeballs. I suspect that ANY spam that humans have to deal with to determine if it's useful is much the same.

[+] randfish|16 years ago|reply
Moultano - I have a strange request, but one I hope you'll take seriously.

I think this issue is very important - to Google, to web searchers, to businesses seeking to be found by Google and even to less scrupulous web operators. I'd love the opportunity to engage in 20-30 minute written chat with you and publish it (anywhere on the web you'd like).

As background, I've worked for years as an SEO consultant, founded a community and company in the space (SEOmoz.org), and have been spending the last few years developing and launching search marketing software.

I certainly respect your background and beliefs, but I think there's some flawed logic in your assumptions and arguments that I'd love to dig into, talk about and maybe even have some of my own perceptions changed. I would not ask you to disclose anything that's confidential - I'm much more interested in the theory and logic behind web spam, SEO and search relevancy.

You can reach me via email - [email protected]. Would love to hear from you!

[+] moultano|16 years ago|reply
Sorry, I'm not equipped for that sort of public discussion. Talk to Matt. ;)
[+] Tichy|16 years ago|reply
Haven't read it all, but I am just wondering: by now data dumps of people's connections are probably making the rounds in the dark channels? I think sending spam that appears to be from your friends could be a big "improvement", and should be child's play with the data that is already freely available.

Maybe that could become one of the first privacy disasters, when people realize they made their email unusable by publishing their connections.

[+] Gormo|16 years ago|reply
If we presume that any algorithmic, procedural, or structural system built by one party can be reverse-engineered and understood by another party, the concept of Optimization by Proxy, and the more general Goodhart's law, form a pretty compelling argument against designing optimized systems as solutions to problems in general.

Maybe in some cases keeping a system convoluted and inconsistent can actually help ensure stability and durability?

[+] samg|16 years ago|reply
Just ask Calacanis!
[+] diN0bot|16 years ago|reply
absolutely....sometimes i mark as "spam" conversations that i'm personally not interested in, even if the author is "legitimately" spamming me. (eg a mis-guided friend's mass email...or more likely the dozens of mis-guided reply-all's)
[+] alextp|16 years ago|reply
I think this is a valid use case of spam filters. I have trained more than one to detect my father's powerpoint emails and bad chain-mail jokes and separate them from his personal messages that I actually want to read.
[+] Tichy|16 years ago|reply
Also all the newsletters companies feel entitled to send just because you bought a toothpick from them 10 years ago.