top | item 44328024

(no title)

hadrien01 | 8 months ago

They also write this:

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

discuss

order

bitpush|8 months ago

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

Normally the expecation is that the user-agent faithfully presents the content it fetched.

If I make a browser that fetches bbc.com, and strips away ads and presented it to users - I would expect BBC to not like it and block the user-agent from accessing it. It isnt a robots.txt thing. It is a user-agent thing.

simonw|8 months ago

Oh wow, I missed that! That's from the docs for that Perplexity‑User user-agent, at which point presumably there's no point in listing that in robots.txt at all?