top | item 47125758

(no title)

> The research findings “could present a challenge to those who argue that the AI model does not store or reproduce any copyright works,” said Cerys Wyn Davies, an intellectual property partner at law firm Pinsent Masons.

The defense to training with copyright is that it is the same as how humans learn from copyrighted material. The storage or reproduction is a red herring. Humans can also reproduce copyrighted works from memory as well. Showing that machines can reproduce copyrighted material is no different than saying that a human can reproduce copyright material that the human learned from.

The defense to actually reproducing a work is that in order to do so, the user has to "break" the system. It is the same as how you can make legal software do illegal things (e.g. screen recorder to "steal" a movie)

None of this is to say that these defenses are correct/moral; but rather that this article doesnt add any additional input into whether it is or isnt.

discuss

moregrist|7 days ago

> Humans can also reproduce copyrighted works from memory as well. Showing that machines can reproduce copyrighted material is no different than saying that a human can reproduce copyright material that the human learned from.

Ultimately this is a matter for the courts and the law, but I'd just like to point out that a human memorizing a work, reproducing it, and distributing it is just as much a copyright violation as doing a more mechanical form of reproduction.

There's a reason that fan fiction routinely falls afoul of copyright. There's quite a lot of case law in this area, and hand-waving "humans can do it too" doesn't really make for a strong argument. Humans get in trouble for it ALL THE TIME. The consequences can be fines, injuctions, or even criminal liability.

I'm not sure why you think AI gets off the hook here. Just because you like the outcome at the moment?

duskwuff|7 days ago

This isn't the defense you think it is. Performing a copyrighted work from memory - e.g. a piece of music, a poem, a story, etc - is still a copyright violation. There's no special protection for works that a human has memorized.

aaroninsf|7 days ago

The key word in the HN headline is _can_.

Humans are not judged on the basis of what they _can_ do.

Reasoning about how to constrain tools on the basis of what they _could_ do, if e.g. used outside their established guardrails, needs to be very nuanced.

gruez|7 days ago

>There's no special protection for works that a human has memorized.

Who's liable for the copyright infringement if you can coax it out of a system? If you can bypass paywalls by using google's cache feature (or since they got rid of it, but using carefully crafted queries to extract the entire text via snippets), is google on the hook or the person doing it?

thisisit|7 days ago

The whole “humans also do this” isn’t a winning defence here. Humans and copyright has long history and so much law that it is easy to get confused.

The default assumption here seems to be that the system needs to be broken. This is similar to the Google defence. If a user intent is to search for a cracked software what can poor Google do about it? The answer is to make it even more difficult.

This is a defence also used by torrent sites using magnet urls. “We don’t host files” is the default defence. But then if these sites get hit with DMCA they are required to remove the magnet url.

So the article shows what the lawyer is saying. Despite claims that it is difficult to search for full books, it really isn’t so. It is trivial. When it goes to court and it will, AI models will be required to make it even more difficult and allow for a DMCA like takedowns.

tsimionescu|7 days ago

> Humans can also reproduce copyrighted works from memory as well

That's simply not true. No humans can memorize entire novels, as this research proved these models do. And definitely not all of these novels, and code bases, and who knows what else all at the same time.

vlabakje90|7 days ago

They absolutely can. Millions of people can recite the Quran verbatim, word for word. That's 77797 words. There is even a title for those people.

https://en.wikipedia.org/wiki/Hafiz_(Quran)

It's not far fetched to think that people could recite books just like an LLM. I don't know why they'd want to, but that's neither here nor there.

gruez|7 days ago

>No humans can memorize entire novels, as this research proved these models do.

Humans can however, remember entire songs, and songs are definitely long enough to be considered copyright protected. There is still a difference in scale, but that's not really relevant when it comes to copyright law. You can't be like "well humans are committing copyright infringement but since it's limited to a few hundred words we'll give it a pass".

butlike|7 days ago

I also was skeptical, but musical works makes more sense for that argument. Their premise is still flawed, though

techblueberry|7 days ago

You can't pay a human to reproduce copyrighted material either.

gcanyon|7 days ago

But the crime in the human instance is the reproduction, not the storage. So the crime in the AI circumstance would not be in the training, but in prompting the output.

And of course AIs are excellent at taking direction, so:

If I prompt it with "Harry Potter, but Voldemort wins: dark, and Hermione is a sex slave to Draco Malfoy" and get "Manacled," that's copyright infringement, and on me, not on the LLM/training.

If I prompt it with "Harry Potter, but Voldemort wins: dark, and Hermione is a sex slave to Draco Malfoy, and change enough to avoid infringing copyright," and get "Alchemised," then that should be fine. I doubt the legal world agrees with me though.

freejazz|7 days ago

>The defense to training with copyright is that it is the same as how humans learn from copyrighted material.

Yeah, it's something people say but it is severely lacking in evidence and credibility.

kgwgk|7 days ago

What calculus?