top | item 42406408

(no title)

jackienotchan | 1 year ago

I'm noticing a big increase in crawling activity on the sites I manage, likely from bots collecting data for LLMs. Most of them don't use proper user agents and of course don't stick to any scraping best practices that the industry has developed over the past two decades.

This trend is creating a lot of headaches for developers responsible for maintaining heavily scraped sites.

related:

- "Dear AI Companies, instead of scraping OpenStreetMap, how about a $10k donation?" - https://news.ycombinator.com/item?id=41109926

- "Multiple AI companies bypassing web standard to scrape publisher sites" https://news.ycombinator.com/item?id=40750182

discuss

order

No comments yet.