top | item 43688149

(no title)

outloudvi | 10 months ago

Vercel has a fairly generous free quota and a non-negligible high pricing scheme - I think people still remember https://service-markup.vercel.app/ .

For the crawl problem, I want to wait and see whether robots.txt is proved enough to stop GenAI bots from crawling since I confidently believe these GenAI companies are too "well-behaved" to respect robots.txt.

discuss

order

otherme123|10 months ago

This is my experience with AI bots. This is my robots.txt:

User-agent: * Crawl-Delay: 20

Clear enough. Google, Bing and others respect the limits, and while about half my traffic are bots, they never DoS the site.

When a very well known AI bot crawled my site in august, they fired up everything: fail2ban put them temporarily in jail multiple times, the nginx request limit per ip was serving 426 and 444 to more than half their requests (but they kept hammering the same Urls), and some human users contacted me complaining about the site going 503. I had to block the bot IPs at the firewall. They ignore (if they even read) the robots.txt.

dvrj101|10 months ago

Nope they have been ignoring robots.txt since the start. There are multiple posts all over the internet.