(no title)
shkkmo | 2 months ago
Right, because you would have done more than learning, you would have then gone past learning and used that learning to reproduce the work.
It works exactly the same for a LLM. Training the model on content you have legal access to is fine. Aftwards, somone using that model to produce a replica of that content is engaged in copyright enfringement.
You seem set on conflating the act of learning with the act of reproduction. You are allowed to learn from copyrighted works you have legal access to, you just aren't allowed to duplicate those works.
sirwhinesalot|2 months ago
If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.
shkkmo|2 months ago
I don't think this is legally true. The law isn't fully settled here, but things seem to be moving towards the LLM user being the holder of the copyright of any work produced by that user prompting the LLM. It seems like this would also place the enfringement onus on the user, not the provider.
> If someone hires me to write some code, and I give them GPLed code (without telling them it is GPLed), I'm the one who broke the license, not them.
If you produce code using a LLM, you (probably) own the copyright. If that code is already GPL'd, you would be the one engaged in enfringement.
Yeask|2 months ago
[deleted]
zephen|2 months ago
shkkmo|2 months ago
"Learning" is an established word for this, happy to stick with "training" if that helps your comprehension.
> LLMs don't "learn" but they _do_ in some cases, faithfully regurgitate what they have been trained on.
> Legally, we call that "making a copy."
Yes, when you use a LLM to make a copy .. that is making a copy.
When you train a LLM... That isn't making a copy, that is training. No copy is created until output is generated that contains a copy.