top | item 1143633

Jason Calacanis Knows He's Spamming Google, He Just Thinks It's No Big Deal

157 points| mvandemar | 16 years ago |smackdown.blogsblogsblogs.com | reply

83 comments

order
[+] dangrossman|16 years ago|reply
While it's always a bit annoying to see a meme take up article slots on Hacker News regularly, I don't mind this one. I see sites like Mahalo as doing massive disservice to huge numbers of people --

* thousands of AdWords advertisers that have paid for their ads to be matched with content sites, not scraper pages with a huge ad to text ratio

* thousands of publishers whose work is being scraped, aggregated and outranked without as much as a backlink

* millions of web searchers that are hitting these pages instead of the real sources of the content they were searching for

And calling out companies that harm the fabric of the web for everyone else is worth doing.

[+] w00pla|16 years ago|reply
The question is: why doesn't google do anything about this? There are countless such pages. Why not just blacklist domains? Or artificially give them lower pagerank?

Or is it okay as long as they get ad-sense money?

[+] qeorge|16 years ago|reply
Its really a shame how low Google's standards are for AdSense. I've tried advertising in the content network several times, but each time I end up wasting so much time blocking MFA sites that I give up.
[+] wheaties|16 years ago|reply
The rub is, if Google were to block all of Mahalo, a lawsuit would surely be filed.
[+] danik|16 years ago|reply
Although I understand your points the AdWords and search issues is googles problem. People pay for a product with flaws in it and google is selling, not Mahalo. Mahalo has the freedom to do whatever they want with their pages and they can't be held responsible for what google in it's turn does with them.

Now, stealing other peoples work, that is obviously Mahalos doing and should be a case for the court.

I'm not really sure what the point with this article is. Mahalo spams google - so what? It's up to them to do this and up to google to prevent it. Business is business, even on the internet.

[+] aresant|16 years ago|reply
Here's one thing to consider : Google's algorithms for detecting the actual value of clicks for advertisers has improved greatly over the past few years.

If Mahalo’s traffic was utter crap, it would be dropped.

[+] jsz0|16 years ago|reply
It's really too bad Google doesn't allow you to simply blacklist domains from your search results permanently. The thing that frustrates me the most about these SPAM sites is the fact they're constantly popping up and my down voting of the result seems to do absolutely nothing unless I'm using the exact same search query. Just let me blacklist Mahalo and other sites like it permanently. Better yet make it possible to subscribe to a blocklist so the community can pool it's resources and fight back.
[+] Devilboy|16 years ago|reply
I request this from Google every couple of months. If I could remove an entire domain from all my personalized search results, I'd be soooo happy.
[+] japherwocky|16 years ago|reply
Jason Calacanis is the Paris Hilton of the web.

Mahalo is not particularly interesting, not particularly evil, he doesn't really do anything, and yet - we keep talking about him and putting his shit on the front page of hacker news.

It's a mediocre aggregator/linkfarm, with some mechanical turk style incentives for humans to contribute, and a nice chunk of $ in the bank. Just ignore/mock him for another year or two until the funding runs dry.

[+] mvandemar|16 years ago|reply
You are absolutely right, it is mediocre crap. The issue is that through the spam techniques described in the article, he can now gain undeserved top 10 rankings for many phrases using pages that have no business being there.

If you are not part of the web development community then these discussions will most likely bore the hell out of you. However, if you are, and if you are aware of how many innocent sites Google bans or penalizes on a daily basis, or AdSense accounts that get canceled with no appeal for offenses much less than his, then this stuff actually matters.

[+] moe|16 years ago|reply
One of the things I really miss in google is a persistent blocking preference a.k.a. site blacklist. Mahalo would go straight in there, along with expert-sexchange, sedo parking and a few others.
[+] ryoshu|16 years ago|reply
Depending on the browser you use, this is easy to do. I have experts-exchange and other useless sites blocked using greasemonkey/greasemetal.
[+] jachee|16 years ago|reply
SEO is snake oil. This is further proof that the whole "industry" is ruining the integrity and usefulness of the internet.
[+] aaronwall|16 years ago|reply
That is the angle Jason used when he created his steaming pile. It doesn't mean that anyone in the industry agrees with Jason's strategy.

And if you want to place the blame where it belongs remember that Google is the company funding all this content scraping with their ads programs.

I just tried searching for a recent post from an official Google blog (about AdSense using referral data for more relevant ad targeting) and found a scraper site with their ads outranking them for their own content. Pretty sad.

[+] lmkg|16 years ago|reply
Not everything that uses the phrase "SEO" should be painted with the same brush. A lot of SEO best practices are making the HTML markup more semantic and human-readable. White-hat SEO dovetails pretty well with a human-readable web and accessibility standards, and is one of the better business cases for kicking the Flash habit. Just because Calacanis makes his name by abusing loopholes in the algorithms doesn't mean that's the only sort of thing that the name "SEO" applies to.
[+] axod|16 years ago|reply
>> "Currently when I look, Google tells me that Mahalo has 356,000 pages indexed"

I see 'Results 1 - 10 of about 2,200,000 from mahalo.com'

Have things stepped up a gear or am I misunderstanding?

[+] mvandemar|16 years ago|reply
No, different datacenters will show different results... sometimes very different. It also matters if you are visiting Google.com or one of the country variants.

According to what Jason said in another comment, however, all of their pages are listed in their xml sitemap, and all of those are listed in a master xml sitemap index located here (warning! huge files if you follow the links in the first one!):

http://www.mahalo.com/sitemapindex.xml

Based on what I saw 2 million+ looks like a huge overestimate, if what Jason said is true.

Edit: My bad, frederickcook's answer was the right one. I didn't realize you were doing a regular text search.

[+] frederickcook|16 years ago|reply
Search "site:mahalo.com"

Simply "mahalo.com" lists every indexed page with that text on it, such as this one.

[+] jasonmcalacanis|16 years ago|reply
this is getting really old and we're not interested in doing anything black hat or even gray hat. as such we're doing the following:

1. we're removing (or building out) any page in our system created by our users with under 200 words of original content. This will take a couple of weeks but it's tarted.

2. we're not letting users create stub pages (short pages) until we can noindex them and put them in a different directory (i.e. /stubs/) so google can easily tell the difference between them.

these pages are < 1% of our revenue and low single digits of our traffic. we don't benefit from them materially, and I think we're being targeted by Aaron Wall and other SEOs for my "seo is bullshit" comment from 2005 or so.

I guess that is fine... I gotta live with the ramifications of what I say. however, for the record I don't believe that SEO is BS any more... when i said that it was when we were building joystiq and autoblog and we spent zero time on SEO.

All that being said, we're being targeted by a small group of folks who want to take us down. we're only going to get stronger from this because our hundreds of contributors are rallying around building out the short pages.

Topix, Kosmix, NYTimes and Zimbio are all making quality topic pages and are not getting attacked over it. not sure why there is some double standard.

regardless.... this is not a material thing for us. we're flushing all these pages and moving them to a different directory going forward so that search engines know where they are located (i.e. /stubs/ ).

thanks for the ass kicking.... having a horrible day today over this.

jcal

http://bit.ly/jasondown

[+] mvandemar|16 years ago|reply
Jason, did you even read my article? This isn't about the traffic those autogenerated pages get, it's about the fact that through the minuscule amounts of PageRank that they are each capable of grabbing, you are now able to rank your mediocre pages with absolutely zero influence from the rest of the web.

We're not talking about stub pages, it's all the fully automated bullshit that you are generating. They not only need to be deindexed, they need to be nofollowed or removed altogether.

How is it you are out there playing the wounded puppy when apparently you haven't even read the articles or followed the reference links? You can't just skim this one and then craft a rebuttal and think you've addressed the issue. There's a lot of data in those paragraphs you apparently just skimmed over (if that even).

You have over 500,000 pages listed in your xml sitemap, and Google appears to have over 330,000 of them indexed. Click on this link, please, and actually go look at 10-12 of the pages we are talking about here:

http://tinyurl.com/yzmxq7b

Tell me how long it takes you, just by clicking through, to find even 3 pages that have any human interaction in them whatsoever.

Maybe, just maybe, you really don't have a clue what is happening. I personally don't believe that's the case, but if so then whoever it is you have working for you that set this up knows how to spam like a pro.

[+] tdm911|16 years ago|reply
we're removing (or building out) any page in our system created by our users with under 200 words of original content. This will take a couple of weeks but it's tarted.

or

i'm also getting a list of every page under 300 words and having the page managers build them out in 30 days or deleting them.

from: http://news.ycombinator.com/item?id=1143512

is it under 200 words or under 300? are the goal posts moving already?

[+] chintan|16 years ago|reply
Try site:kosmix.com or site:righthealth.com

Kosmix always had noindex in their "search results" - Now stop whining like a baby and get your ass back to work instead of justifying your mistakes.

[+] Joepuf|16 years ago|reply
Jason - the "seo is bs" line was heard at every seo conference you headlined when you launched mahalo.
[+] CoachRufus87|16 years ago|reply
this is getting very, very old. can we move on? please??
[+] prawn|16 years ago|reply
It's obviously of interest and importance to a number of people involved in this field or troubled by poor quality material showing up in Google. If you're not one of those people, it's pretty easy to identify these links and not upvote them or visit the articles/comments, etc.

There are countless articles on HN that I have no interest in (e.g., I don't even know what Clojure is), but I just don't click through to them.

[+] fjabre|16 years ago|reply
Think this thread has gotten way out of hand.

A lot of the comments in here seem like more of a personal attack than anything else. You might as well change the title of this post to "Jason Calacanis ruined the Internet" or something to that effect..

Can we stop the drama already? I think we're going to need a hose to control this mob.