(no title)
captainmuon | 9 months ago
It gets annoying when you have the right to scrape something - either because the owner of the data gave you the OK or because it is openly licensed. But then the webmaster can't be bothered to relax the rate limiter for you, and nobody can give you a nice API. Now people are putting their Open Educational Resources, their open source software, even their freaking essays about openness that they want the world to read behind Anubis. It makes me shake my head.
I understand perfectly it is annoying when badly written bots hammer your site. But maybe then HTTP and those bots are the problem. Maybe we should make it easier for site owners to push their content somewhere where we can scrape it easier?
Analemma_|9 months ago
berkes|9 months ago
yladiz|9 months ago
To be frank: it’s not your content, it’s theirs, and it doesn’t matter if you like it or not, they can decide what they want to do with it, you’re not entitled to it. Yes there are some cases that you personally have permission to scrape, or the license explicitly permits it, but this isn’t the norm.
The bigger issue isn’t that people don’t want their content to be read it’s that they want it to be read and consumed by a human in most cases, and they want their server resources (network bandwidth, CPU, etc) to be used in a manageable way. If these bots were written to be respectful, then maybe we wouldn’t be in this situation. These bots poisoned the well, and they affect respectful bots because of their actions.