Let's say you had a local model with the ability to do tool calls. You give that llm the ability to use a browser. The llm opens that browser, goes to Google or Bing, and does whatever searches it needs to do.
I think they mean that it's a tool accessing URLs in response to a user request to present to the user live - with that user being a human. Like if you used some webpage translation service, or non-ML summarizer.
There's some gray area though, and the search engine indexing in advance (not sure if they've partnered with Bing/Google/...) should still follow robots.txt.
In practice, robots.txt is to control which pages appear in Google results, which is respected as a matter of courtesy, not legality. It doesn't prevent proxies etc. from accessing your sites.
zupa-hu|11 months ago
josh-sematic|11 months ago
gopher_space|11 months ago
nicce|11 months ago
pests|11 months ago
What is the difference if I use a browser or a LLM tool (or curl, or wget, etc) to make those requests?
Tostino|11 months ago
Why would that be an issue?
bayindirh|11 months ago
I thought they were just machine code running on part GPU and part CPU.
Ukv|11 months ago
There's some gray area though, and the search engine indexing in advance (not sure if they've partnered with Bing/Google/...) should still follow robots.txt.
Filligree|11 months ago
unknown|11 months ago
[deleted]
postexitus|11 months ago
Filligree|11 months ago
dudeinjapan|11 months ago