(no title)
lolinder | 8 months ago
> A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
There's nothing recursive about "summarize all the cooking recipes linked on this page". That's a single-level iterative loop.
I will grant that I should alter my original statement: if OP wanted to respect robots.txt when it receives a request that should be interpreted as an instruction to recursively fetch pages, then I'd think that's an appropriate use of robots.txt, because that's not materially different than implementing a web crawler by hand in code.
But that represents a tiny subset of the queries that will go through a tool like this and respecting robots.txt for non-recursive requests would lead to silly outcomes like the browser refusing to load reddit.com [0].
mattigames|8 months ago
lolinder|8 months ago