top | item 35968849

(no title)

uvnq | 2 years ago

Using this logic, every writer who has learned to read and write by reading books, or every artist who improved their craft by studying works, or every musician who learned the piano by practicing pieces, is also "stealing" in whatever they "originally" create due to learning via pattern recognition "tiny pieces of every work" in their data set. It's ridiculous to compare agents that generalize well to "stealing" pieces of the works they used to learn the generalizations. Obviously if an artist memorizes a painting in their data set and reproduces it, or an AI spits out the exact image instead of original works based on what it has learned, then that is theft. But generalization is not theft. At least in my view. To assume otherwise leads to some very dysfunctional logical conclusions

discuss

order

naet|2 years ago

I think this is a fallacy that I see a lot in recent AI discussions. An LLM is not the same as a human brain. You might see some superficial similarities in both being able to produce a block of text, but the method by which the text is produced is entirely different. For example, we can't download entire libraries of books instantly to our brains and then reproduce those books word for word in memory. Things that operate at different scales and by different methods should have different regulations, in the same way a bike or a car is regulated differently from a truck or another piece of heavy machinery.

Also, humans can be, and often are, found liable for copyright infringement or for piracy depending on how they conduct themselves. If a human was to reproduce a copyrighted book word for word, that would consist of copyright infringement regardless of whether it was done by rote memory, by copy and paste, or assisted by a black box LLM. Even if a human paraphrases another work they can still be found guilty of plagiarism if the paraphrase is still overly similar to the original source material. A human can also be guilty of copyright infringement if they use a copyright work as source material in certain ways. If I steal a stock image without paying for a license and add it in my Photoshop collage, I might be found to have pirated or infringed on the original image creator's property.

LLMs are trained on copyright data and can often reproduce that copyright data. It's an open question how we regulate this.

I personally think it would be fair for an artist or author to say their work was not licensed to be used in training a neural net or otherwise request to opt out.

nate_meurer|2 years ago

> For example, we can't download entire libraries of books instantly to our brains and then reproduce those books word for word in memory.

Yeah. Nobody's talking about word-for-word duplication here.

> If a human was to reproduce a copyrighted book word for word...

Again?

> Even if a human paraphrases another work they can still be found guilty of plagiarism if the paraphrase is still overly similar to the original source material.

Go look up the dictionary definition of plagiarism. Notice the most crucial element, which you seem to have omitted here, and also notice that it's irrelevant to AI systems, which overtly acknowledge that they exist to generate derivative works.

> If I steal a stock image without paying for a license

Here's another version of your "word-for-word" analogy, which nobody else is talking about.

> I personally think it would be fair for an artist or author to say their work was not licensed to be used in training a neural net or otherwise request to opt out.

I am genuinely curious: how do you propose to enforce this?

fennecfoxy|2 years ago

Another humans are special and different argument.

Consider an inevitable AGI with autonomy and no ties to a corporate. Can it not learn and write text whether in conversation, creatively or academically? If it's held to the same copyright laws that humans are then of course it's fine, in my mind. Hell, AI will be _better_ at avoiding infringing on others as they can store so much knowledge and process new knowledge (searching the internet for similar works) than humans are at infringing.

If this is still a problem then doesn't this just boil down to racism against the machine?

chii|2 years ago

> LLMs ... can often reproduce that copyright data. It's an open question how we regulate this.

why isn't existing copyright protection sufficient to regulate this? Photoshop can be used today to reproduce copyrighted data just as well.

This has nothing to do with "license to train". I do not believe existing copyright holders have this right granted to them by law - it is a right that is given to society for all works.

An artist learning a style, and producing another piece in the same style, is allowed today. This should be allowed, regardless of whether it is done via using an AI, or via years of training.

june_twenty|2 years ago

Using your logic, I could learn to read a book by saving each sentence to my database. Then, when requested by someone else, I could regurgitate the book sentence by sentence from my db. And also charge the person for it.

gumballindie|2 years ago

AI is not writers artists or developers, it’s software that ingests data and generates an output. Anthropomorphism is going out of fashion fast.

Art9681|2 years ago

Before we can agree on anything, we have to define what qualifies a human as any of those things. We will be here all night debating that.

- "They call themselves an artist but they are not that good." - "Copying and pasting shell scripts in a terminal does not a software developer make." - "The story that writer created is yet another permutation of <insert tale as old as time>"

You see what I mean? There is a good probability that today alone a significant percentage of content you saw online was AI generated and you were non-the-wiser and thought nothing of it.

fennecfoxy|2 years ago

It isn't now, but as we approach an inevitable singularity, whether it's in 100 years or 1000 years, what then?

Are we gonna fall prey to the scifi trope of "let's all be racist to machines", if so then I wouldn't hold it against AGIs to fall prey to the scifi trope of "I am gonna b evil now, bye bye humans".

Gigachad|2 years ago

If the output is indistinguishable, how can this matter? If I publish a work, how can it be copyright infringement when generated by AI, but not if I came up with the exact same output myself?

How would you even know which way I did it?

uvnq|2 years ago

It's not anthropomorphism. They are both generalizing agents. It is an accurate and meaningful comparison.

Bran_son|2 years ago

"Stealing" is an action that does not depend on who or what the perpetrator is.