top | item 46165941

(no title)

james2doyle | 2 months ago

This is just using robots.txt and asking "pretty please, don’t scrape me".

Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...

If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.

discuss

order

Aeolun|2 months ago

If someone has a robots.txt, and I want to request their page, but I want to do that in an automated way, should I open the browser to do it instead of issue a curl request? How about if I am going to ask claude to fetch the page for me?

kentm|2 months ago

Respect the robots.txt and don’t do it?

cpncrunch|2 months ago

Yes, I was referring to legitimate companies, and Perplexity doesn't seem to be one of those.

albedoa|2 months ago

Oh for sure. When he wrote of the AI companies that are "stealing/crawling/hammering", you thought he meant the legitimate ones that do honor robots.txt. That makes sense.