top | item 44328024

(no title)

hadrien01 | 8 months ago

They also write this:

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

discuss

bitpush|8 months ago

> Since a user requested the fetch, this fetcher generally ignores robots.txt rules.

Normally the expecation is that the user-agent faithfully presents the content it fetched.

If I make a browser that fetches bbc.com, and strips away ads and presented it to users - I would expect BBC to not like it and block the user-agent from accessing it. It isnt a robots.txt thing. It is a user-agent thing.

simonw|8 months ago

Oh wow, I missed that! That's from the docs for that Perplexity‑User user-agent, at which point presumably there's no point in listing that in robots.txt at all?

dabeeeenster|8 months ago

I mean, that's just not true.

esskay|8 months ago

Which part? It's widely established and known that many AI crawlers are ignoring the robots.txt file, perplexity being one of them [1]

[1]https://www.tomshardware.com/tech-industry/artificial-intell...