top | item 38786490

(no title)

hhsectech | 2 years ago

I'm not for or against anything at this point until someone gets their balls out and clearly defines what copyright infringement means in this context.

If you give a bunch of books to a kid all by the same author and then pay that kid to write a book in a similar style and then I go on to sell that book...have I somehow infringed copyright?

The kids book at best is likely to be a very convincing facsimile of the original authors work...but not the authors work.

It seems to me that the only solution for artists is to charge for access to their work in a secure environment then lobotomise people on the way out.

The endgame seems to be "you can view and enjoy our work, but if you want to learn or be inspired by it, thats not on"

discuss

twoodfin|2 years ago

There are two problems with the “kid” analogy:

a) In many closely comparable scenarios, yes, it’s copyright infringement. When Francis Ford Coppola made The Godfather film, he couldn’t just be “inspired” by Puzo’s book. If the story or characters or dialog are similar enough, he has to pay Puzo, even if the work he created was quite different and not a literal “copy”.

b) Training an LLM isn’t like giving someone a book. Among other things, it involves making a derivative copy into GPU memory. This copy is not a transitory copy in service of a fair use, nor likely a fair use in itself, nor licensed by the rights-holder.

andy99|2 years ago

> This copy is not a transitory copy in service of a fair use

Training is almost certainly fair use, so it's exactly a transitory copy in service of fair use. Training, other than the brief "transitory copy" you mention is not copying, it's making a minuscule algorithmic adjustment based on fleeting exposure to the data.

EarthMephit|2 years ago

> If the story or characters or dialog are similar enough, he has to pay Puzo, even if the work he created was quite different and not a literal “copy”.

I don't think that you can copyright a plot or story in any country can you?

If he re-wrote the story with different characters and different lines he wouldn't have had to to pay Puzo. I'm sure it would have been frowned upon if its too close, but legally ok.

randombits0|2 years ago

>This copy is not a transitory copy in service of a fair use, nor likely a fair use in itself,

Seems vastly transitory and since the output cannot be copyrighted, does no harm to any work it “trained” on.

fennecbutt|2 years ago

How is it a copy at all? Surely the model weights would therefore be much larger than the corpus of training data, which is not the case at all.

If it disgorges parts of NYT articles, how do we know this is not a common phrase, or the article isn't referenced verbatim on another, unpaid site?

I agree that if it uses the whole content of their articles for training, then NYT should get paid, but I'm not sure that they specifically trained on "paid NYT articles" as a topic, though I'm happy to be corrected.

I also think that companies and authors extremely overvalue the tiny fragments of their work in the huge pool of training data, I think there's a bit of a "main character" vibe going on.

PaulDavisThe1st|2 years ago

Regarding (b) ... while a specific method of training that involved persistent copying may indeed be a violation, it is far from clear that the general notion of "send server request for URL, digest response in software that is not a browser" is automatically a violation. If there is deemed to be a difference (i.e. all you are allowed to do without a license is have a human read it in a browser), then one can see training mechanisms changing to accomodate that.

soerxpso|2 years ago

I don't have a comment on your hypothetical, but this case seems to go far beyond that. If you read the actual filing at the bottom of the linked page, NYT provides examples where ChatGPT recited exact multi-paragraph sections of their articles and tried to pass it off as its own words. Plainly reproducing a work is pretty much the only situation where "is this copyright violation?" isn't really in flux. It's not dissimilar to selling PDFs of copywritten books.

If NYT were fully rellying on the argument that training a model in wordcraft using their materials is always copyright violation, or only had short quotes to point to, the philosophical debate you're trying to have would be more relevant.

incangold|2 years ago

Importantly, the kid- an individual human- got some wealth somewhat proportional to their effort. There’s non-trivial effort in recruiting the kid. We can’t clone the kid’s brain a million times and run it for pennies.

There are differences that are ethically, politically and in other ways between an AI doing something and a human doing the exact same thing. Those differences may need reflecting in new laws.

IANAL ans don’t have any positive suggestions for good laws, just pointing out that the analogy doesn’t quite hold. I think we’re in new territory where analogies to previous human activities aren’t always productive.

flextheruler|2 years ago

I think you’re skipping over the problem.

In your example you owned the work you gave to the person to create derivatives of.

In a more accurate example you would be stealing those books and then giving them to someone else to create derivatives.

slyall|2 years ago

How about if I borrowed them from the library and gave them to the kid to read?

How about if I got the kid to read the books on a public website where the author made the books available for free?

graphe|2 years ago

Ironically these artists cant claim to be wholly original as they were certainly inspired. Artists that play live already "lobotomize" people on their way out since it's not easy to recreate an experience and a video isn't the same if it's a good show.

Artists that make easily reproducible art will circulate as these always have along with AI in a sea of other jpgs.

sulrich|2 years ago

you might be well served by reading the actual complaint.

OOPMan|2 years ago

I think your kid analogy is flawed because it ignores the fact that you couldn't reasonably use said "kid" to rapidly produce thousands of works in the same style and then go on to use them to flood the market and drown out the original authors presence.

Try this with a real "kid" and you'll run into all kids of real-world constraints whereas flooding the world with derivative drivel using LLMs is something that's actually possible.

So yeah, stop using weak analogies, it's not helpful or intelligent.