top | item 45923586

(no title)

chupchap | 3 months ago

From what I understood, the case against OpenAI wasn't about the summarisation. It was the fact that the AI was trained on copyrighted work. In case of Wikipedia, the assumption is that someone purchased the book, read it, and then summarised it.

discuss

order

colechristensen|3 months ago

There are separate issues.

One is a large volume of pirated content used to train models.

Another is models reproducing copyrighted materials when given prompts.

In other words there's the input issue and the output issue and those two issues are separate.

cameldrv|3 months ago

They’re sort of separate. In a sense you could say that the ChatGPT model is a lossily compressed version of its training corpus. We acknowledge that a jpeg of a copyrighted image is a violation. If the model can recite Harry Potter word for word, even imperfectly, this is evidence that the model itself is an encoding of the book (among other things).

You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.

bawolff|3 months ago

That doesn't really make sense . Just because you purchased a book, does not mean the copyright goes away (for new works based on the book. For the physical book you bought, the doctrinevof first sale gives you some rights but only in that specific physical copy ). If openAI pirated material, that would be a separate issue from if the output of the LLM is infringing.