Strange popup:
"Hello, i see you are coming from hacker news.
the article you clicked on was most certainly not submitted by nodejitsu.
news.ycombinator has a long history of squashing articles and submitters that aren't funded by y-comb.
most of this is done through their "silent" banning and censoring mechanisms, that leave people not even realizing they have been silenced.
i hope you enjoy this article, and remember that HN is extremely biased and that you should keep your horizons broad."
While I would agree that HN is bias towards YC-funded projects I would not agree that it is biased against non-YC projects or news. In fact, the majority of the items on HN are non-YC. This also follows for submitters and commenters for the year or more I've been here.
On a different note. Hpricot is not representative of Ruby scraping anymore - nokogiri (http://nokogiri.org/) is where it's at. Which has a Hpricot translation layer if you need to change. Even when I decided to solidify on Python for everything else I will still go back to Ruby just for nokogiri when it comes to scraping.
Marak, the guy behind the Nodejitsu (and, presumably, the popup message) is known to exhibit asshole behaviour. (See: http://news.ycombinator.com/item?id=1448309) The popup message is consistent with what HN knows of him.
Whether being a jerk justifies banning I can't say - but his assertion that HN is biased has little justification (particularly when you consider that the writer himself is biased.) Kindly ignore.
I'm not sure how you can be biased toward something without being biased against everything else, but the spirit of your meaning seems true. YC companies and submitters have an institutional advantage, but I am not with a YC company and don't feel discriminated against.
Hey guys. The Nodejitsu team and Marak (http://www.github.com/Marak), the guy behind Nodejitsu are perma-banned from HN and can't respond to your queries.
He sends his regards, and if you'd like to contact him visit the #Node.js IRC channel @ Freenode
You'd have to parse the page seperately and run each piece of in line scripts / linked scripts in a sandbox which can talk to jsdom, but it could be done.
The article lists BeautifulSoup as the Python choice for scraping, but that isn't necessarily true. I'm using http://scrapy.org/, for example, which is a scraping framework that uses lxml and xpath by default.
The article reads "The challenge with using these libraries is that they all have their own quirks that can make working with HTML, CSS and Javascript challenging."
And that's true only if you only want to do page manipulation in Javascript. I'm perfectly happy with my page manipulation in Ruby w/ Nokogiri. Here's an example:
(code formatting on HN sucks, so it's on my blog, apologies)
Yes, as cliched as it is, I think it's time. I couldn't use at least 6/10 of the node challenge top 10 when it hit the HN front page (and the rest were beset by bugs and didn't work - the pixel one where you form characters stopped showing the shape I was supposed to be trying to get into after a few rounds, and the robot war one never let me buy or release my wave of robots on Chrome or Firefox). Overall it was totally disappointing experience.
[+] [-] drats|15 years ago|reply
the article you clicked on was most certainly not submitted by nodejitsu.
news.ycombinator has a long history of squashing articles and submitters that aren't funded by y-comb.
most of this is done through their "silent" banning and censoring mechanisms, that leave people not even realizing they have been silenced.
i hope you enjoy this article, and remember that HN is extremely biased and that you should keep your horizons broad."
While I would agree that HN is bias towards YC-funded projects I would not agree that it is biased against non-YC projects or news. In fact, the majority of the items on HN are non-YC. This also follows for submitters and commenters for the year or more I've been here.
On a different note. Hpricot is not representative of Ruby scraping anymore - nokogiri (http://nokogiri.org/) is where it's at. Which has a Hpricot translation layer if you need to change. Even when I decided to solidify on Python for everything else I will still go back to Ruby just for nokogiri when it comes to scraping.
[+] [-] shadowsun7|15 years ago|reply
Whether being a jerk justifies banning I can't say - but his assertion that HN is biased has little justification (particularly when you consider that the writer himself is biased.) Kindly ignore.
[+] [-] brown9-2|15 years ago|reply
[+] [-] robinduckett|15 years ago|reply
[+] [-] steilpass|15 years ago|reply
[+] [-] rb2k_|15 years ago|reply
[+] [-] aneth|15 years ago|reply
[+] [-] robinduckett|15 years ago|reply
He sends his regards, and if you'd like to contact him visit the #Node.js IRC channel @ Freenode
[+] [-] il|15 years ago|reply
As far as I know this is impossible with any other server-side scraping technology.
If so, that would be amazingly useful for a couple of my side projects, much easier than parsing their Javascript code and extracting the info I need.
[+] [-] robinduckett|15 years ago|reply
[+] [-] vially|15 years ago|reply
[+] [-] fmw|15 years ago|reply
[+] [-] bmelton|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]
[+] [-] wmil|15 years ago|reply
[+] [-] fizx|15 years ago|reply
http://github.com/fizx/pquery#readme
[+] [-] tcarnell|15 years ago|reply
For portability, performance and flexability I finally settled for writing my own HTML parser and CSS selection engine from scratch.
[+] [-] knowtheory|15 years ago|reply
And that's true only if you only want to do page manipulation in Javascript. I'm perfectly happy with my page manipulation in Ruby w/ Nokogiri. Here's an example:
(code formatting on HN sucks, so it's on my blog, apologies)
http://blog.knowtheory.net/post/1074676060/xml-manipulation-...
[+] [-] forsaken|15 years ago|reply
[+] [-] drats|15 years ago|reply
[+] [-] robinduckett|15 years ago|reply
[+] [-] jfager|15 years ago|reply
[+] [-] unknown|15 years ago|reply
[deleted]