(no title)
archievillain | 7 months ago
Objectively, "I give you one (1) URL and you traverse the link to it so you can get some metadata" still counts as crawling, but I think that's not how most people conceptualize the term.
It'd be like telling someone "I spent part of the last year travelling." and when they ask you where you went, you tell them you commuted to-and-fro your workplace five times a week. That's technically travelling, although the other person would naturally expect you to talk about a vacation or a work trip or something to that effect.
JimDabell|7 months ago
It’s definitely not crawling as robots.txt defines the term. :
> WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages.
— https://www.robotstxt.org/orig.html
You will see that reflected in lots of software that respects robots.txt. For instance, if you fetch a URL with wget, then it won’t look at robots.txt. But if you mirror a site with wget, then it will fetch the initial URL, then it will find the links in that page, then before fetching subsequent pages it will fetch and check robots.txt.