(no title)
enord | 2 years ago
An ML model can neither have nor be in breach of copyright so any discussion about how it works, and how that relates to how people work or “learn” is besides the point.
What actually matters is firstly details about collation of source material, and later the particular legal details surrounding attribution. The last part involves breaking new ground legally speaking and IANAL so I will reserve judgement. The first part, collation of source material for training is emphatically not unexplored legal or moral territory. People are acting like none of the established processes apply in the case of LLMs and handwave about “learning” to defend it.
Ukv|2 years ago
It is important (for the training and generation stages) to distinguish between whether the model copies the original works or merely infers information from them - as copyright does not protect against the latter.
> The first part, collation of source material for training is emphatically not unexplored legal or moral territory.
Similar to as in Authors Guild v. Google, Inc. where Google internally made entire copies of millions of in-copyright books:
> > While Google makes an unauthorized digital copy of the entire book, it does not reveal that digital copy to the public. The copy is made to enable the search functions to reveal limited, important information about the books. With respect to the search function, Google satisfies the third factor test
Or in the ongoing Thomson Reuters v. Ross Intelligence case where the latter used the former's legal headnotes for training a language model:
> > verbatim intermediate copying has consistently been upheld as fair use if the copy is "not reveal[ed] to the public."
That it's an internal transient copy is not inherently a free pass, but it is something the courts take into consideration, as mentioned more explicitly in Sega v. Accolade:
> > Accolade, a commercial competitor of Sega, engaged in wholesale copying of Sega's copyrighted code as a preliminary step in the development of a competing product [yet] where the ultimate (as opposed to direct) use is as limited as it was here, the factor is of very little weight
And, given training a machine learning model is a considerably different purpose than what the images were originally intended for, it's likely to be considered transformative; as in Campbell v. Acuff-Rose Music:
> > The more transformative the new work, the less will be the significance of other factors
enord|2 years ago