top | item 36658731

(no title)

jmkb | 2 years ago

Eventually, I imagine a new licensing concept will emerge, similar to the idea of music synchronization rights -- maybe call it "training rights." It won't matter whether the text was purchased or pirated -- just like it doesn't matter now if an audio track was purchased or pirated, when it's mixed into in a movie soundtrack.

Talent agencies will negotiate training rights fees in bulk for popular content creators, who will get a small trickle of income from LLM providers, paid by a fee line-itemed into the API cost. Indie creators' training rights will be violated willy-nilly, as they are now. Large for-profit LLMs suspected or proven as training rights violators will be shamed and/or sued. Indie LLMs will go under the radar.

discuss

fweimer|2 years ago

Is it all that different from indexing for search? That does not seem to require a license from the copyright holder under U.S. law (but other countries may treat as a separate exploitation right). If indexing for search is acceptable, then something that is intended to be more transformative should be legal as well.

(Personally, I think that even indexing for search should require permission from the copyright holder.)

phkahler|2 years ago

>> Talent agencies will negotiate training rights fees in bulk for popular content creators

AFAICT there is no legal recognition of "training rights" or anything similar. First sale right is a thing, but even textbooks don't get extra rights for their training or educational value.

belorn|2 years ago

Many legal concepts used by courts has no legal recognition in the law texts. Much of legal practice are just precedents, policies, customs, and doctrines.

Parent comment mention music synchronization rights, and this concept does not exist in copyright. Court do occasionally mention it, and lawyers talks about it, but in terms of the legal recognition there is basically only the law text that define derivative work and fair use. One way to interpret it is that court has precedents to treat music synchronization as a derivative work that do not fall under fair use.

Using textbooks in training/education is not as black and white that one may assume. Take this Berkeley (https://teaching.berkeley.edu/resources/course-design/using-...). Copying in this context include using pages for slides and during lectures (which is a slightly large scope than making physical copies on physical paper). In obvious case the answer is likely obvious, but in others it will be more complex.

sigstoat|2 years ago

This is why jmkb referenced synchronization rights, which (as I recall) were invented when they seemed useful. jmkb is suggesting a new right might be created, not claiming that they already exist.

(even if it wasn’t sync rights, there was something else musically related that was created in response to technological development. wikipedia will have plenty on it)

a_wild_dandan|2 years ago

I suspect the opposite outcome also being plausible: the LLM is viewed analogously to a blog author. The blogger/LLM may consume a book, subsequently produce "derived" output (generated text), and thus generate revenue for the blogger/LLM's employer. Consequently, the blogger/LLM's output -- while "derived" in some sense -- differs enough to be considered original work, rather than "derivative work" (like a book's film adaptation). Auditing how the blogger/LLM consumed relevant material is thus absurd.

Of course, this line of reasoning hinges on the legitimacy of an "LLM agent <-> blogger agent" type of analogy. I suspect the equivalence will become more natural as these AI agents continue to rapidly gain human-like qualities. How acceptable that perspective would be now, I have no idea.

In contrast, if the output of a blogger is legally distinct from an AI's, the consequences quickly become painful.

* A contract agency hires Anne to practice play recitals verbally with a client. Does the agency/Anne owe royalties for the material they choose? What if the agency was duped, and Anne used -- or was -- a private AI which did everything?

* How does a court determine if a black box AI contains royalty-requiring training material? Even if the primary sources of an AI's training were recorded and kosher, a sufficiently large collection of small quotes could be reconstructed into an author's story.

* What about AIs which inherit (weights, or training data generated) from other AIs of unknown training provenance? Or which were earlier trained on some materials with licenses that later changed? Or AIs that recursively trained their successors using copyrighted works which it AI reconstructed from legal sources? When do AIs become infected with illegal data?

The business of regulating learning differently depending on whether the agent uses neurons or transistors seems...fraught. Perhaps there's a robust solution for policing knowledge w.r.t silicon agents. If you have an idea, please share!

the8472|2 years ago

Humans are also trained on copyrighted content they see. Should every artist have to pay that fee too on every work they create?

Disney will finally be able to charge a "you know what the mouse looks like" tax.

nyolfen|2 years ago

i don't understand why a new licensing regime would be necessary, the model is clearly a fair use derivative work. it does exactly what a human does -- observes information, distills it into symbolic systems of meaning, and produces novel content that exists in the same semantic universe as its experiences.