Bots could be crawlers gathering data to periodically be used as raw training data or the requests could just be from a web search agent of some form like ChatGPT finding latest news stories on topic X for example. I don’t know if robots.txt can distinguish between the two types of bot request or whether LLM providers even adhere to either.
jay_kyburz|4 months ago