top | item 44795426

(no title)

avallach | 6 months ago

Cloudflare did explain a proper solution: "Separate bots for separate activities". E.g. here: one bot for scraping/indexing, and one for non-persistent user-driven retrieval.

Website owners have a right to block both if they wish. Isn't it obvious that bypassing a bot block is a violation of the owners right to decide whom to admit?

Perplexity's almost seems to believe that "robots.txt was only made for scraping bots, so if our bot is not scraping, it's fair for us to ignore it and bypass the enforcement". And their core business is a bot, so they really should have known better.

discuss

viraptor|6 months ago

They're already doing that https://docs.perplexity.ai/guides/bots There's PerplexityBot and Perplexity‑User.

avallach|6 months ago

And then once they see that the website operator blocked the perplexity-user, apparently instead of respecting that, they not only ignore robots.txt, but actively try to bypass the security measures established with the explicit purpose of limiting their access. If this was about bypassing DRM rather than AI-WAF, it would be plainly illegal.

To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.

skeledrew|6 months ago

> bypassing a bot block is a violation of the owners right to decide whom to admit?

There is only a violation if the bot finds a way around a login block. Same for human. But whatever is on the public web is... public. For all.

hunter2_|6 months ago

So it's ok to block someone "because you didn't include a session token I gave you in exchange for knowing the password" but it's not ok to block someone "because you didn't stick to manually-operated user agents as I told you via robots.txt"? What about not letting someone play level 42 "because you didn't complete level 41"?

A web server providing a response to your request is akin to a restaurant server doing the same. Except for specific situations related to civil rights, they are free to not deal with you for any reason.