top | item 39081032

(no title)

enord | 2 years ago

Listen, most website and book-authors want to be indexed by google. It brings potential audience their way, so most don’t make use of their _right_ to be de-listed. For these models, there is no plausible benefit to the original creators, and so one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.

discuss

Ukv|2 years ago

> It brings potential audience their way, so most don’t make use of their _right_ to be de-listed.

The Authors Guild lawsuit against Google Books ended in a 2015 ruling that Google Books is fair use and as such they don't have a right to be de-listed. It's not the case that they have a right to be de-listed but choose not to make use of it.

The same would apply if collation of data for machine learning datasets is found to be fair use.

> one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.

Datasets I'm aware of already have respected machine-readable opt-outs, so if that were to be legally enforced (as it is by the EU's DSM Directive for commercial data mining) I don't think it'd be the end of the world.

There's a lot of power in a default; the set of "everything minus opted-out content" will be significantly bigger than "nothing plus opted-in content" even with the same opinions.

enord|2 years ago

With the caveat that I was exactly wrong about the books de-listing, I feel you are making my point for me and retreating to a more pragmatic position about defaults.

The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow. You would be a fool not to, the AI companies are trying to end run you, using your own content, and make a profit without compensating you and leave you with no recourse.