jakubbalada's comments

jakubbalada | 6 years ago | on: Launch HN: Dashblock (YC S19) – Turn Any Website into an API

> A growing catalog of up-to-date scrapers for popular websites would put of lot of freelancers out of work. I would invest in this.

Check out Apify store (https://apify.com/store). It's built exactly for that purpose.

(Disclaimer: I'm a co-founder of Apify)

jakubbalada | 8 years ago | on: Ask HN: What are best tools for web scraping?

You can use services like Anti-captcha [1]

We have a public API on Apify for that [2]

[1] https://anti-captcha.com/mainpage

[2] https://www.apify.com/petr_cermak/anti-captcha-recaptcha

jakubbalada | 8 years ago | on: Show HN: Apify – Turn any website into an API

Typically depends on what you do with scraped data. There is an interesting recent US court ruling [1]

[1] http://www.reuters.com/article/us-microsoft-linkedin-ruling/...

jakubbalada | 8 years ago | on: Show HN: Apify – Turn any website into an API

(co-founder here) We have hundreds of customers, half of them on recurring subscriptions, half just one-time customers paying for crawler configurations.

jakubbalada | 8 years ago | on: U.S. judge says LinkedIn cannot block startup from public profile data

You might try Apifier for that, we've recently scraped more than 150k reviews for 27k restaurants in London.

Here's a community crawler you can use: https://www.apifier.com/community/crawlers/Yonny/bcYqH-api-u...

jakubbalada | 9 years ago | on: Web Scraping: Bypassing “403 Forbidden,” captchas, and more

It's also hard to get direct access to the data.

But you're right it's a hard sell to enterprises although we have some (e.g. real estate developer creating pricing maps)

jakubbalada | 9 years ago | on: Web Scraping: Bypassing “403 Forbidden,” captchas, and more

Both - developers on a free plan using own RSS for sites without one and business people (mainly startups) building their products on top of Apifier.

Typical use is an aggregator that needs common API for all partners who are not able to provide it. So they have running API on Apifier in an hour. It might break once in a while - than you have to update your crawler (not that often if you use internal AJAX calls).

jakubbalada | 9 years ago | on: Web Scraping: Bypassing “403 Forbidden,” captchas, and more

We see a lot of users who needs data from the web or APIs for sites which doesn't have one. Just not all of them can code and we have to scale custom development.

jakubbalada | 9 years ago | on: Web Scraping: Bypassing “403 Forbidden,” captchas, and more

Disclaimer: I'm a co-founder of Apifier [1].

It's not an open source, but free up to 10k pages per month. And it can handle modern JS web applications (your code runs in a context of crawled page). You can for example scrape API key at first and then use internal AJAX calls.

There's also a community page [2] where you can find and use crawlers made by other users.

[1] https://www.apifier.com [2] https://www.apifier.com/community/crawlers

jakubbalada | 10 years ago | on: RoboBrowser: Your friendly neighborhood web scraper

If you want to scrape many websites on a daily basis, have a look at https://www.apifier.com as an alternative.

Disclaimer: I'm a cofounder there

jakubbalada | 10 years ago | on: Our Experience of the Inaugural Y Combinator Fellowship

At least in our batch we felt we should really only focus on our product. That's why we've got the grant - to be able to work on our startup full-time.

jakubbalada | 10 years ago | on: Our Experience of the Inaugural Y Combinator Fellowship

(Apifier co-founder here) It will be less beneficial now than it was in the first YCF batch. Other fellows really motivated us during the group office hours and other events. And still - main thing is to focus on a product which you can do anywhere.

jakubbalada | 10 years ago | on: Show HN: Apifier – hosted web crawler for developers

You're right, price per request would be easier for estimation. But as you can use JavaScript, you can scrape whole website with just one page request (see the SFO flights example). In other words, our costs doesn't correlate with page requests, but with data transfer.

Flat fee is also possible, but we think that it's fair that users pay based on their consumption.

jakubbalada | 10 years ago | on: Show HN: Apifier – hosted web crawler for developers

Of course not, API will be available soon - in a week or two. If you have some other feature requests, please let us know, we need to help with prioritization.

jakubbalada | 10 years ago | on: Show HN: Apifier – hosted web crawler for developers

Yes, by default we respect robots.txt. There is a switch to disable it - on your own responsibility. We don't fully respect Crawl-delay, but minimum delay between requests for all our crawlers is set to 2000ms. We don't publish our IP ranges yet.