top | item 40181685

(no title)

KieranMac | 1 year ago

I'm a lawyer that works in the web-scraping space, and I always chuckle when I read threads like this. Almost every company that we now consider a monopolist (or their affiliates) in the tech space used scraping a part of their process to build their business, and almost every one of those same monopolists now prohibits startups and competitors from scraping their data (which, invariably, is not actually "their" data in any sort of legally cognizable sense). And so perhaps the ethics of web scraping are not so straightforward. And neither are the legal issues associated with it.

I wrote an article about that last fall that got some attention here.

https://news.ycombinator.com/item?id=37264676

discuss

order

richardw|1 year ago

Same thing with Facebook and identity. IIRC they leveraged Google’s address book to get traction, but will go after you if you try store FB social graph data long term for anything outside their garden.

You try to block the tricks you used to get growth, basically.

jMyles|1 year ago

> And so perhaps the ethics of web scraping are not so straightforward.

It strikes me that the _ethics_ of web scraping are extremely straightforward and cognizable with a terse analysis:

* You can respond however you like to my HTTP request, and I can parse your response however I like.

Simple, traditional, common. This is the way that conversations have occurred since the dawn of human communication, no?

> the legal issues associated with it.

But aren't these, without exception, fabrics spun out of the cloth that shields established players with the threat of state violence? This is not particularly new, and seems to fit in the pathetic-and-predictable file.

Moreover, the broader cheap attempt to cast this in "intellectual" property terms, and to attach that to protection of artists and creators, warrants a very particular eye-roll for its illogic.

theamk|1 year ago

Do you apply this ethics to webs scraping only, or to all other network communications too?

Because if that's your general principles, you are making the internet much shittier. I still remember the old internet with open SMTP servers, easy-to-use comment forms, and forums which did not require emails and capthas. But people with "You can respond however you like to my HTTP request" attitude ruined it with spam, scam and SEO.

If you only apply this to web scraping, then where do you draw the line and why? Can you scrape at maximum rate server can support? Can you scrape if this requires active action (like account creation?) As long as you scrape, can you also post some links to improve your SEO?

elicksaur|1 year ago

If I say, “Hey, please don’t text me anymore. I’m going to block this number,” and you respond by buying 500 phones in five cities and text me nonstop, is that ethical?