That would be bad, and it is already bad that Google and Microsoft control so much of search queries, but the decision about which search engine indexes a website is purely the publisher's.
A publisher publishes. Once something is published, once something is public, the control a publisher has over the published thing is limited. For example, a publisher can not choose who reads a book after it is sold, who reads an article after it got printed. It is not a given at all that a publisher should have any say about being indexed. The search engine relies on a public fact - X wrote Y. That's legal (limitations apply).
The position of Brave of not accepting to be blacklisted if not all search engines are blacklisted is a pragmatic one. It works against Google's search monopoly, but still gives them some legal coverage since the robots.txt is not completely ignored, in case that indeed matters somewhere. I think it's an elegant approach well suited for the current state of the web, one that serves the greater good. And Brave, but that's completely fine.
A search engine index is an economic exchange between the website and the publisher.
To massively (over)simplify the argument to its essence (and ignore other important points): the publisher goes through the trouble and expense of creating the content
The publisher then allows its content to be copied by a search engine only because being shown in search results gets it traffic back. The traffic it gets in return has value, and the publisher is happy for this arrangement to continue as long as the value of the traffic is more than the cost of producing and serving the content.
Brave offering a "license", for its own financial benefit, to "allow" others to use the content for LLM training gives zero benefit to the original publisher. This is why I use words like "sleazy" to describe Brave's position.
This argument applies to Google and Microsoft. Right now both are failing at citing sources in their generative AI search results. That is terrible and I hope it's fixed soon, as otherwise they're being sleazy scrapers as much as Brave is.
Finally, I wholeheartedly disagree they what Brave is doing is for the "greater good". The fact they charge extra for the "license" to use the content for LLM training shows that.
You are completely wrong. A publisher controls the publication of a work, text or other, including it's duplication and licensing. Other companies cannot xerox a book and sell it. Clear?
onli|2 years ago
The position of Brave of not accepting to be blacklisted if not all search engines are blacklisted is a pragmatic one. It works against Google's search monopoly, but still gives them some legal coverage since the robots.txt is not completely ignored, in case that indeed matters somewhere. I think it's an elegant approach well suited for the current state of the web, one that serves the greater good. And Brave, but that's completely fine.
pierrefar|2 years ago
To massively (over)simplify the argument to its essence (and ignore other important points): the publisher goes through the trouble and expense of creating the content The publisher then allows its content to be copied by a search engine only because being shown in search results gets it traffic back. The traffic it gets in return has value, and the publisher is happy for this arrangement to continue as long as the value of the traffic is more than the cost of producing and serving the content.
Brave offering a "license", for its own financial benefit, to "allow" others to use the content for LLM training gives zero benefit to the original publisher. This is why I use words like "sleazy" to describe Brave's position.
This argument applies to Google and Microsoft. Right now both are failing at citing sources in their generative AI search results. That is terrible and I hope it's fixed soon, as otherwise they're being sleazy scrapers as much as Brave is.
Finally, I wholeheartedly disagree they what Brave is doing is for the "greater good". The fact they charge extra for the "license" to use the content for LLM training shows that.
user_named|2 years ago
cvalka|2 years ago
unknown|2 years ago
[deleted]
yreg|2 years ago
pierrefar|2 years ago
https://news.ycombinator.com/item?id=36993739