top | item 5955043

Tell HN: The front page of Hacker News has been deindexed from Google

66 points| Roedou | 12 years ago | reply

You can confirm this by searching for 'hacker news' in Google; the #1 ranking URL is /newest, rather than the front page. This isn't term specific - the site doesn't appear for other terms that it usually ranks well for, such as "news.ycombinator.com" or "hn".

I've checked the usual technical reasons (html head canonical/robots meta tag, http headers, robots.txt issues) but I don't see anything untoward.

I'll keep looking into it, but I'm posting this here in case the admins/mods have made any changes recently that could have had an effect. There's a possibility that the URL has been removed by Google for some particular reason, though I can't think of many pages that deserve it less than HN.

I'll update this thread if I see anything, but hopefully someone else will post an answer before I figure it out....

38 comments

order
[+] Matt_Cutts|12 years ago|reply
It's not that PG has a grudge against Google (or vice versa) or anything like that. I believe that search engine bots crawl Hacker News hard enough that PG blocks most crawling by bots. In the case of Google, he does allow us to crawl from some IP addresses, but it's true that Google isn't able to crawl/index every page on Hacker News.

Here's a link where I answered the same question about three weeks ago: https://news.ycombinator.com/item?id=5837004 , so this isn't a new issue. In fact, PG has been blocking various bots since 2011 or so; https://news.ycombinator.com/item?id=3277661 is one of the original discussions about this.

And to show this isn't a Google-specific issue, note that Bing's #1 result for the search [hacker news] is a completely different site, thehackernews.com: http://www.bing.com/search?q=hacker+news

In general, I think PG's priority is to have a useful, interesting site for hackers. That takes precedence and is the reason why I believe PG blocks most bots: so that crawling doesn't overload the site.

[+] Roedou|12 years ago|reply
Thanks for that Matt; I didn't see that recent post or your comment, so sorry for dragging you back here to repeat yourself.

Looks like I'm going to have to stop relying on searching 'hn' when using a different computer, and start typing in the full URL. First world problems are such a burden.

[+] chintan|12 years ago|reply
Doesn't Googlebot respect Crawl-Delay in robots.txt? PG has set it to be 30 secs - https://news.ycombinator.com/robots.txt - which IMHO should not cause any load issues given the overall traffic profile of HN.
[+] jlgreco|12 years ago|reply
Mmm, seems kind of like a feature. In fact, maybe PG should robots.txt google entirely. It seems like HN has been getting mentions in other media with increasing frequency. If you can't find the site just because google doesn't doesn't list it, then I have to wonder what you are actually doing here. This wouldn't be the first way that HN sets a bar for new users either; the "Create Account" form is already hidden under "submit".

HNSearch works great for HN specific searches anyway.

[+] JoeCortopassi|12 years ago|reply
This has happened before, and usually has a non-pitchforky reasoning (e.g. PG pulled it temporarily because of network/server issue). I'm sure it will be back soon, and we will have a rather reasonable answer as to why. There are way to many google employees, that frequent and enjoy HN, for it to be banned for some arbitrary reason
[+] AsymetricCom|12 years ago|reply
And of course, the network has specific functions for censorship as required by child protection laws. "Just a network error" really doesn't guarantee that the network wasn't doing something nefarious itself.
[+] gee_totes|12 years ago|reply
If you are using DuckDuckGo, you can use the !hn bang to send your query to hnsearch.com
[+] eliben|12 years ago|reply
This is trivial to do in any modern browser without DDG.
[+] Roedou|12 years ago|reply
I found this old thread, where pg had blocked most of the Google bots, and it caused Google to think the site was down:

https://news.ycombinator.com/item?id=3277661

Could be a similar issue? I'll take a look.

[+] Roedou|12 years ago|reply
pg also commented that he doesn't want traffic from Google anyway: https://news.ycombinator.com/item?id=5808990

In which case, he should add: <meta name="googlebot" content="noindex"> to the html head of every page.

(I have to say, that's a smart way of avoiding any Eternal Septembering, but it'd be a shame. I often use Google to find old HN threads that I vaguely remember from months or years ago.)

[+] meritt|12 years ago|reply
This is most likely the same reason digg's frontpage was deindexed. There's no "content" per se, it's just links. Someone will notice, add an exception, and all is well.

Unlike Digg, HN has a substantial amount of content in the comments pages though, which are heavily indexed.

Edit - All the comment pages are still indexed just fine. It's /only/ the front-page. Which, imo, doesn't really matter anyway.

[+] Roedou|12 years ago|reply
That wasn't the reason for Digg's issue at all: Google had tried to manually deindex some pages from the site, but made a mistake and pulled the whole domain. They reincluded it shortly after.
[+] pstuart|12 years ago|reply
The comments are the real content of this site.
[+] aidscholar|12 years ago|reply
Sounds like overaggressive spam detection.
[+] malandrew|12 years ago|reply
I too had noticed this. It's unfortunate because searching via Google with site:news.ycombinator.com in the query is much better than HN's own search when you have a good idea what you're looking for (spearfishing search vs BFS)
[+] chacham15|12 years ago|reply
This isnt the first time this has happened and I suspect that it wont be the last.
[+] gscott|12 years ago|reply
The pagerank has fallen from a 6 to a 3 as well.
[+] godgod|12 years ago|reply
Google is evil. Screw them. I refuse to use Google or their services. Make the switch. They deindex a lot of sites they don't agree with. Not saying that is the case here but they've been known to do it.