jakubbalada's comments

jakubbalada | 9 years ago | on: Web Scraping: Bypassing “403 Forbidden,” captchas, and more

Both - developers on a free plan using own RSS for sites without one and business people (mainly startups) building their products on top of Apifier.

Typical use is an aggregator that needs common API for all partners who are not able to provide it. So they have running API on Apifier in an hour. It might break once in a while - than you have to update your crawler (not that often if you use internal AJAX calls).

jakubbalada | 9 years ago | on: Web Scraping: Bypassing “403 Forbidden,” captchas, and more

Disclaimer: I'm a co-founder of Apifier [1].

It's not an open source, but free up to 10k pages per month. And it can handle modern JS web applications (your code runs in a context of crawled page). You can for example scrape API key at first and then use internal AJAX calls.

There's also a community page [2] where you can find and use crawlers made by other users.

[1] https://www.apifier.com [2] https://www.apifier.com/community/crawlers

jakubbalada | 10 years ago | on: Our Experience of the Inaugural Y Combinator Fellowship

(Apifier co-founder here) It will be less beneficial now than it was in the first YCF batch. Other fellows really motivated us during the group office hours and other events. And still - main thing is to focus on a product which you can do anywhere.

jakubbalada | 10 years ago | on: Show HN: Apifier – hosted web crawler for developers

You're right, price per request would be easier for estimation. But as you can use JavaScript, you can scrape whole website with just one page request (see the SFO flights example). In other words, our costs doesn't correlate with page requests, but with data transfer.

Flat fee is also possible, but we think that it's fair that users pay based on their consumption.

jakubbalada | 10 years ago | on: Show HN: Apifier – hosted web crawler for developers

Yes, by default we respect robots.txt. There is a switch to disable it - on your own responsibility. We don't fully respect Crawl-delay, but minimum delay between requests for all our crawlers is set to 2000ms. We don't publish our IP ranges yet.
page 1