top | item 30274890

(no title)

BlewisJS | 4 years ago

Unrelated to the article - is it just me or is this scrapingbee product borderline nefarious? From the homepage:

> Thanks to our large proxy pool, you can bypass rate limiting website, lower the chance to get blocked and hide your bots!

> Scrapingbee helps us to retrieve information from sites that use very sophisticated mechanism to block unwanted traffic, we were struggling with those sites for some time now and I'm very glad that we found ScrapingBee.

discuss

order

whakim|4 years ago

It really depends. There are plenty of legitimate uses for scraping (for example, I've been involved with academic research that involved scraping Twitter search results), and it's only really feasible to collect the amount of data you need using scraping plus paid proxies. That being said, there are also a number of nefarious paid proxy services which offer residential IPs (read: are usually botnets).

BlewisJS|4 years ago

What is legitimate to a user is not the same as what is legitimate to a site owner. The legitimate way would probably be to use the Twitter API.

stickfigure|4 years ago

No more nefarious than the measures websites put up to avoid scrapers? This just rehashes the Linkedin vs Hiq case: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

(not a user, but I do some amount of scraping through other means)

brimble|4 years ago

It is definitely super annoying that companies are allowed to spy on us and do all kinds of crazy things with our data, all using computers and automation and "bots" and such, but individuals are increasingly not allowed to use automation to help us out online. Seems rather one-sided. On the other hand, I get that abuse is a huge problem. I do wish at least bots operating at roughly human request rates & daily total requests were considered OK and universally allowed without risk of blocks or other difficulties leading to increased maintenance costs (so, making them less valuable).

paxys|4 years ago

"Nefarious" is a strong word. Courts have repeatedly ruled that scraping data that is otherwise available publicly is legal. You may not personally agree with the ethics, but there are a lot of people who do.

BlewisJS|4 years ago

I agree it's a strong word, which is why I said borderline nefarious. However, it's not that far off from a DDOS tool.

At least in the United States, sounds like the jury is still out on the legality: https://www.reuters.com/technology/us-supreme-court-revives-..., but my perspective was more from an ethics standpoint anyway.

fjabre|4 years ago

Nefarious? Then they should arrest Google first, it is the king of web scrapers.

NicoJuicy|4 years ago

Robots.txt