From what I understood, the case against OpenAI wasn't about the summarisation. It was the fact that the AI was trained on copyrighted work. In case of Wikipedia, the assumption is that someone purchased the book, read it, and then summarised it.
They’re sort of separate. In a sense you could say that the ChatGPT model is a lossily compressed version of its training corpus. We acknowledge that a jpeg of a copyrighted image is a violation. If the model can recite Harry Potter word for word, even imperfectly, this is evidence that the model itself is an encoding of the book (among other things).
You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.
That doesn't really make sense . Just because you purchased a book, does not mean the copyright goes away (for new works based on the book. For the physical book you bought, the doctrinevof first sale gives you some rights but only in that specific physical copy ). If openAI pirated material, that would be a separate issue from if the output of the LLM is infringing.
colechristensen|3 months ago
One is a large volume of pirated content used to train models.
Another is models reproducing copyrighted materials when given prompts.
In other words there's the input issue and the output issue and those two issues are separate.
cameldrv|3 months ago
You hear people saying that a trained model can’t be a violation because humans can recite poetry, etc, but a transformer model is not human, and very philosophically and economically importantly, human brains can’t be copied and scaled.
bawolff|3 months ago
unknown|3 months ago
[deleted]
throwaway-0001|3 months ago
jen729w|3 months ago
https://authorsguild.org/advocacy/artificial-intelligence/wh...