top | item 37161478

(no title)

wilsynet | 2 years ago

The NYT argument is going to be that they put up a site, own the copyright for their content and make that content available for either a human to read it for themselves, or software to index for something commonly understood as a search engine. Those terms do not entitle the training of LLMs for commercial use. Therefore, cease and desist. Oh and destroy anything that was created by violating the terms of our license.

You can make arguments like a) what is ChatGPT but a different kind of search engine, or b) what is an LLM but a primitive human, or c) but but uhh we didn’t agree to these terms.

But I do not think those arguments will prevail.

discuss

paulmd|2 years ago

The LinkedIn case already proves that you cannot impose conditions on works you freely serve to the public. The data is there to anyone who sends a request (you don’t even need to be logged in) and if they do something you don’t like with it then oh well.

So if that’s the argument it’s already been argued by LinkedIn and lost.

This is one of those things where copyright holders have gotten absurdly full of themselves though. Like what you’ve said is that copyright holders have the right to impose a contract of adhesion on data that they are broadcasting into the public without any idea with whom they are even forming a contract, and that’s a facially absurd and incredibly noxious idea if you follow it to the conclusions it implies.

Copyright is about securing to the public works of significance and encouraging their creation and the way it’s become a lifetime-plus-75-year guarantee of intellectual ownership of ideas is fundamentally noxious and goes against the intent and spirit of the idea. And if that’s where the copyright regime is headed then I’d rather see chatGPT kill off copyright entirely.

sumtechguy|2 years ago

NYT will have to prove that the derivative work is still theirs. Just violating the license may not be enough. That could be bad by itself I guess. But considering the interactive prompt can produce a wild amount of variations of 'not NYT stuff' will make it though to say what sort of damages is this.

A similar sort of issue popped up in the 80s around colorization of films. https://www.latimes.com/archives/la-xpm-1987-06-20-ca-8405-s... https://chart.copyrightdata.com/Colorization.html

The answer may be 'maybe'? As from what I read they basically split the decision down to 'i know it when I see it' style of ruling. If the copyright is still in effect then NYT owns that portion of the output but not others parts. As the secondary effect would be owned by the generator company (in this case OpenAI) or the person who prompted for it. If that is the case NYT would have to prove what parts (nodes? bacreferences? weights?) they own?

ojosilva|2 years ago

Terms of Use are a thing, and if the Times can prove that OpenAI infringed their web terms by scraping, they may have a case... but terms of use probably won't monetize well or give them enough leverage to prevent OpenAI from using their data anyway and may end-up distracting from the main copyright suit.

colejohnson66|2 years ago

Violating TOS, at least to scrape and use later, is legal.[0] I'm not sure how the ruling interacts with LLMs, but I'm sure OpenAI's lawyers would bring it up.

[0]: https://www.forbes.com/sites/zacharysmith/2022/04/18/scrapin...