top | item 41204580

(no title)

Jiocus | 1 year ago

That's why it's possible to have a default deny rule in robots.txt

    User-agent: *
    Disallow: /
And possibly allow-list the ones you accept. This probably won't change the fact that you may allow a vendor at one point in time, only to realise they changed their crawling use case and has been scraping data for AI training for the past 6 months (before they go public about it).

It can be argued that if you are a server operator, you always know which User-agents are making requests to your resources.

discuss

order

No comments yet.