top | item 39276397

(no title)

A specification for those who want content searchable on search engines, but not used for machine learning.

Publishers need improved ways to indicate how they want content to be used in search and machine learning. Using robots.txt does not cover all use cases, and so a complementary approach is needed as proposed here. It is one which can be applied to individual webpages as desired, and can be preserved as such in datasets of web content.

discuss

No comments yet.