(no title)
chacham15 | 7 days ago
The defense to training with copyright is that it is the same as how humans learn from copyrighted material. The storage or reproduction is a red herring. Humans can also reproduce copyrighted works from memory as well. Showing that machines can reproduce copyrighted material is no different than saying that a human can reproduce copyright material that the human learned from.
The defense to actually reproducing a work is that in order to do so, the user has to "break" the system. It is the same as how you can make legal software do illegal things (e.g. screen recorder to "steal" a movie)
None of this is to say that these defenses are correct/moral; but rather that this article doesnt add any additional input into whether it is or isnt.
moregrist|7 days ago
Ultimately this is a matter for the courts and the law, but I'd just like to point out that a human memorizing a work, reproducing it, and distributing it is just as much a copyright violation as doing a more mechanical form of reproduction.
There's a reason that fan fiction routinely falls afoul of copyright. There's quite a lot of case law in this area, and hand-waving "humans can do it too" doesn't really make for a strong argument. Humans get in trouble for it ALL THE TIME. The consequences can be fines, injuctions, or even criminal liability.
I'm not sure why you think AI gets off the hook here. Just because you like the outcome at the moment?
duskwuff|7 days ago
aaroninsf|7 days ago
Humans are not judged on the basis of what they _can_ do.
Reasoning about how to constrain tools on the basis of what they _could_ do, if e.g. used outside their established guardrails, needs to be very nuanced.
gruez|7 days ago
Who's liable for the copyright infringement if you can coax it out of a system? If you can bypass paywalls by using google's cache feature (or since they got rid of it, but using carefully crafted queries to extract the entire text via snippets), is google on the hook or the person doing it?
thisisit|7 days ago
The default assumption here seems to be that the system needs to be broken. This is similar to the Google defence. If a user intent is to search for a cracked software what can poor Google do about it? The answer is to make it even more difficult.
This is a defence also used by torrent sites using magnet urls. “We don’t host files” is the default defence. But then if these sites get hit with DMCA they are required to remove the magnet url.
So the article shows what the lawyer is saying. Despite claims that it is difficult to search for full books, it really isn’t so. It is trivial. When it goes to court and it will, AI models will be required to make it even more difficult and allow for a DMCA like takedowns.
tsimionescu|7 days ago
That's simply not true. No humans can memorize entire novels, as this research proved these models do. And definitely not all of these novels, and code bases, and who knows what else all at the same time.
vlabakje90|7 days ago
https://en.wikipedia.org/wiki/Hafiz_(Quran)
It's not far fetched to think that people could recite books just like an LLM. I don't know why they'd want to, but that's neither here nor there.
gruez|7 days ago
Humans can however, remember entire songs, and songs are definitely long enough to be considered copyright protected. There is still a difference in scale, but that's not really relevant when it comes to copyright law. You can't be like "well humans are committing copyright infringement but since it's limited to a few hundred words we'll give it a pass".
butlike|7 days ago
techblueberry|7 days ago
gcanyon|7 days ago
And of course AIs are excellent at taking direction, so:
If I prompt it with "Harry Potter, but Voldemort wins: dark, and Hermione is a sex slave to Draco Malfoy" and get "Manacled," that's copyright infringement, and on me, not on the LLM/training.
If I prompt it with "Harry Potter, but Voldemort wins: dark, and Hermione is a sex slave to Draco Malfoy, and change enough to avoid infringing copyright," and get "Alchemised," then that should be fine. I doubt the legal world agrees with me though.
freejazz|7 days ago
Yeah, it's something people say but it is severely lacking in evidence and credibility.
kgwgk|7 days ago