top | item 7066807

(no title)

hartard | 12 years ago

I love the execution, but I also see inherent problems.

Robots.txt is just a convention to advise crawlers. I'm confident most sites explicitly state this is against their terms of service.

You will encounter terms along the lines of:

"Unauthorized uses of the Site also include, without limitation, those listed below. You agree not to do any of the following, unless otherwise previously authorized by us in writing: Use any robot, spider, scraper, other automatic device, or manual process to monitor, copy, or keep a database copy of the content or any portion of the Site."

discuss

pranade|12 years ago

You've got a valid point. We want to eventually create a space that allows responsible scraping - so webmasters can have access to analytics on what's being scraped and can explicitly turn off kimono APIs for their domains if they see fit. We also think there are use cases for people who own their own data. Often, APIs will provide a way for companies to streamline their internal app development and figure out what to expose to the developer community before investing in an expensive API deployment.

_delirium|12 years ago

The law isn't entirely blind to conventions, though. They don't guarantee anything, but if a court understood that there exists a convention for saying "no robots, please", and the robot operator in question followed it, then a court could well look less favorably on the damages claims of a website operator who didn't make use of the widely known convention.