top | item 43355391

(no title)

conjectures | 11 months ago

It does apply to people? When you read a copy of a book, you can't be sued for making a copy of the book in the synapses of your brain.

Now, if you have eidetic memory and write out large chunks of the book from memory and publish them, that's what you could be sued for.

discuss

order

tsimionescu|11 months ago

This is not about memory or training. The LLM training process is not being run on books streamed directly off the internet or from real-time footage of a book.

What these companies are doing is:

1. Obtain a free copy of a work in some way.

2. Store this copy in a format that's amenable to training.

3. Train their models on the stored copy, months or years after step 1 happened.

The illegal part happens in steps 1 and/or 2. Step 3 is perhaps debatable - maybe it's fair to argue that the model is learning in the same sense as a human reading a book, so the model is perhaps not illegally created.

But the training set that the company is storing is full of illegally obtained or at least illegally copied works.

What they're doing before the training step is exactly like building a library by going with a portable copier into bookshops and creating copies of every book in that bookshop.

visarga|11 months ago

But making copies for yourself, without distributing them, is different than making copies for others. Google is downloading copyrighted content from everywhere online, but they don't redistribute their scraped content.

Even web browsing implies making copies of copyrighted pages, we can't tell the copyright status of a page without loading it, at which point a copy has been made in memory.

triceratops|11 months ago

> When you read a copy of a book

They're not talking about reading a book FFS. You absolutely can be sued for illegally obtaining a copy of the book.