One way to easily bypass is to let external services fetching robots.txt (archive.org, GitHub actions, etc...) to cache it and either expose through separate apis/webhook/manual download to the actual scrape server.
robots txt file size is usually small and would not alert external services.
No comments yet.