top | item 34476152

(no title)

flawi | 3 years ago

This is literally what the AI does as well. It didn't walk into a bookstore and steal all the books off the shelf, it read through material made available to it entirely legally.

The thing that authors are trying to argue here is that they should get to control what type of entity should be allowed to view the work they purchased. It's the same as going "you bought my book, but now that I know you're a communist, I think the courts should ban you from reading it".

discuss

order

denton-scratch|3 years ago

> they should get to control what type of entity should be allowed to view the work they purchased

No, that's not it. It's more like if I memorized a bunch of pop-songs, then performed a composition of my own whose second verse was a straight lift of a song by Madonna. I would owe her performance royalties. And I would be obliged to reproduce her copyright notice, so that my audience would know that if they pull the same stunt, they're on the hook for royalties too.

Dylan16807|3 years ago

There are lots of people arguing against the training itself. And people arguing against all outputs, even when there is no detectable copying. I don't know how you missed those takes. You're arguing the wrong point here. Many people do want to say "no ai can look".

htfu|3 years ago

Only if you released it. You could definitely perform it in the shower without owing anything. And the 99% of your compositions that didn't wholesale mirror any specific song would be perfectly fine to release.

Now, moving from holding the model creator culpable to the user would obviously be problematic as well, since they have no way of knowing whether the output is novel or a copy paste. Some sort of filter would seem to be the solution, it should disregard output that exactly or almost exactly matches any input.

Winsaucerer|3 years ago

But it's not humans reading it, it's using it to train ML models. There are similarities between humans learning from books and ML models being trained on it, but there are also salient differences, and those differences lead to concerns. E.g., I am concerned about these large tech companies being the gatekeepers of AI models, and I would rather see the beneficiaries and owners of these models also be the many millions or billions of content creators who first made them possible.

It's not obvious to me that the implicit permission we've been granting for humans to view our content for free also means that we've given permission for AI models to be trained on that data. You don't automatically have the right to take my content and do whatever you like with it.

I have a small inconsequential blog. I intended to make that material available for people to read for free, but I did not have (but should have had!) the foresight to think that companies would take my content, store it somewhere else, and use it for training their models.

At some point I'll be putting up an explicit message on my blog denying permission to use for ML training purposes, unless the model being trained is some appropriately open-sourced and available model that benefits everyone.

chii|3 years ago

> You don't automatically have the right to take my content and do whatever you like with it.

actually you don't have the right to restrict the content, except as part of what's allowed in copyright law (those rights a spelt out - like distribution, broadcasting publicly, making derivative works).

specifically, you cannot have the right to restrict me from reading the works, and learning from it.

Imagine a hypothetical scenario - i bought your book, and counted the words and letters to compile some sort of index/table, and published that. Not a very interesting work, but it is transformative, and thus, you do not own copyright to my index/table. You cannot even prevent me from doing the counting and publishing.

alpaca128|3 years ago

> It didn't walk into a bookstore and steal all the books off the shelf, it read through material made available to it entirely legally.

Github ignored the licenses of countless repos and simply took everything posted publicly for training. They didn't care whether it was available to them entirely legally, they just pretended that copyright doesn't exist for them.

Dylan16807|3 years ago

Isn't the definition of public repo that anyone is allowed to download and read it?