top | item 8564250

Show HN: Scrape.it – Change-Resilient Web Scraper

16 points| notastartup | 11 years ago |scrape.it | reply

16 comments

order
[+] pablohoffman|11 years ago|reply
We released a similar open source tool for visual scraping, earlier this year, called Portia: https://github.com/scrapinghub/portia

It's been getting quite a bit of traction and we're currently working on the integration with Scrapinghub platform (disclaimer: I work there) for those who prefer a hosted version.

[+] ryeon|11 years ago|reply
For simple web scraping, I find that kimonolabs.com does a perfectly fine job.
[+] notastartup|11 years ago|reply
I love what they are doing with kimono and import.io

There's no free lunch lets put it that way. It's free, it's simple but limiting for doing anything heavier. It covers only a small portion of the websites. You can't crawl all the links in a website and hard to scrape data from dynamic webpages etc. Also I found that some websites wouldn't even load making it impossible to define the fields to scrape.

[+] jbob2000|11 years ago|reply
I can imagine some scenarios where you would use a web scraper, but I'm curious; What are people actually using a web scraper for? Does anyone have one in production?
[+] Mandatum|11 years ago|reply
I use one to scrape session times from local theatre websites. They don't have the capacity to build an API so I have an agreement with them that they keep the formats the same. I've set up a script which alerts them if they've screwed it up.
[+] agersant|11 years ago|reply
I used to run one that I made for displaying a nice view of all the art posted on conceptart.org (a very large message board which I found cumbersome to navigate).
[+] martin-adams|11 years ago|reply
My product MyShopData.com uses web scraping for retailers to extract their own data and integrate with marketplaces.
[+] dailen|11 years ago|reply
Holy crap that's expensive!
[+] warkdarrior|11 years ago|reply
This is ripe for resellers to move in and make some dough.
[+] notastartup|11 years ago|reply
There's no metering (https://scrape.it/tour2) so you can create as many web scrapes as you want. Scrape.it constantly monitors each job to make sure the data extraction doesn't get interrupted when website changes. Should the website change, your jobs get updated automatically to continue working.