top | item 44796663

(no title)

Trung0246 | 6 months ago

One way to easily bypass is to let external services fetching robots.txt (archive.org, GitHub actions, etc...) to cache it and either expose through separate apis/webhook/manual download to the actual scrape server.

robots txt file size is usually small and would not alert external services.

discuss

No comments yet.