top | item 47125338

(no title)

yathern | 7 days ago

Hmmm I think you're sort of right but not entirely. It's true that a novel consists of a valid organization of tokens, and that this sequence can be feasibly made to be output from a model. But when you say this:

> So there should be a prompt that can cause that sequence to be output

Is where I think I might disagree. For example, the odds of predicting verbatim the next sentence in, say, Harry Potter should be astronomically low for a large majority of it. If it wasn't, it'd be a pretty boring book. The fact that it can do this with relative ease means it has been trained on the material.

The issue at hand is about copyright and Intellectual Property - if the goal of copyright is to protect the IP of the author, then LLMs can sort of act like an IP money laundering scheme - where the black box has consumed and can emit this IP. The whole concept of IP is a little philosophical and muddy, with lots of grey area for fair use, parody, inspiration, and adaptation. But this gets very odd when we consider it in light of these models which can adapt and use IP at a massive massive scale.

discuss

No comments yet.