I'm a lawyer that works in the web-scraping space, and I always chuckle when I read threads like this. Almost every company that we now consider a monopolist (or their affiliates) in the tech space used scraping a part of their process to build their business, and almost every one of those same monopolists now prohibits startups and competitors from scraping their data (which, invariably, is not actually "their" data in any sort of legally cognizable sense). And so perhaps the ethics of web scraping are not so straightforward. And neither are the legal issues associated with it.I wrote an article about that last fall that got some attention here.
https://news.ycombinator.com/item?id=37264676
richardw|1 year ago
You try to block the tricks you used to get growth, basically.
jMyles|1 year ago
It strikes me that the _ethics_ of web scraping are extremely straightforward and cognizable with a terse analysis:
* You can respond however you like to my HTTP request, and I can parse your response however I like.
Simple, traditional, common. This is the way that conversations have occurred since the dawn of human communication, no?
> the legal issues associated with it.
But aren't these, without exception, fabrics spun out of the cloth that shields established players with the threat of state violence? This is not particularly new, and seems to fit in the pathetic-and-predictable file.
Moreover, the broader cheap attempt to cast this in "intellectual" property terms, and to attach that to protection of artists and creators, warrants a very particular eye-roll for its illogic.
theamk|1 year ago
Because if that's your general principles, you are making the internet much shittier. I still remember the old internet with open SMTP servers, easy-to-use comment forms, and forums which did not require emails and capthas. But people with "You can respond however you like to my HTTP request" attitude ruined it with spam, scam and SEO.
If you only apply this to web scraping, then where do you draw the line and why? Can you scrape at maximum rate server can support? Can you scrape if this requires active action (like account creation?) As long as you scrape, can you also post some links to improve your SEO?
elicksaur|1 year ago