(no title)
avallach | 6 months ago
To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.
avallach | 6 months ago
To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.
skeledrew|6 months ago
avallach|6 months ago
But it also feels like essentially "pirating" the webpages while erasing their brand. Maybe it's even a tolerable transitive situation, but you can't even argue it's beneficial in the same way as game piracy could be according to some. In the long term, we need an incentive for the content creators to willingly allow such processing. Otherwise, a lot of high quality content will eventually become members-only with DRM-like anti agent protections.
The incentive doesn't have to be monetary. I could for example imagine some website owners allow AI agents that commit to upfront verbatim repeating some sort of mandatory headers/messages/acknowledgements from the content authors, before copying or summarizing, and are known to stick to this commitment.
You can also bypass the problem already now by accessing and copying the content manually, and then putting it in the context of a tool like NotebookLM. Nobody's hurt, because you have actually seen the source by yourself, and that's all the website owners can reasonably demand.
TL;DR: why even post quality content in open if the audience won't see your ads, your donation button, or even your name. What do you think?