top | item 37466820

1 in 3 news sites block OpenAI via robots.txt

2 points| palewire | 2 years ago |palewi.re

2 comments

order

palewire|2 years ago

The 392 news organizations listed at this URL have instructed OpenAI’s GPTBot to not scan their sites, according to a continual survey of 1,119 online publishers conducted by the homepages.news archive. That amounts to 35.0% of the total.

The artificial intelligence company has suggested it will not train future editions of ChatGPT using sites that opt out of GPTBot crawls via the robots.txt convention.

Our archiving system gathers each news organization’s robots.txt file twice per day. This page automatically updates with the latest results.

The sites we track are a best effort to cover a broad cross-section of news publishing.

That said, the sample is not comprehensive. It's also primarily focused on the English language market.

jlpcsl|2 years ago

This should be opt-in so this corporate piracy is blocked by default.