top | item 43962976

(no title)

mattxxx | 9 months ago

Well, firing someone for this is super weird. It seems like an attempt to censor an interpretation of the law that:

1. Criticizes a highly useful technology 2. Matches a potentially-outdated, strict interpretation of copyright law

My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Since AI is and will continue to be so useful and transformative, I think we just need to acknowledge that our laws did not accomodate this use-case, then we should change them.

discuss

order

madeofpalk|9 months ago

> Humans can read a book, get inspiration, and write a new book and not be litigated against

Humans get litigated against this all the time. There is such thing as, charitably, being too inspired.

https://en.wikipedia.org/wiki/List_of_songs_subject_to_plagi...

jrajav|9 months ago

If you follow these cases more closely over time you'll find that they're less an example of humans stealing work from others and more an example of typical human greed and pride. Old, well established musicians arguing that younger musicians stole from them for using a chord progression used in dozens of songs before their own original, or a melody on the pentatonic scale that sounds like many melodies on the pentatonic scale do. It gets ridiculous.

Plus, all art is derivative in some sense, it's almost always just a matter of degree.

zelphirkalt|9 months ago

The law covers these cases pretty well, it is just that the law has very powerful extremely rich adversaries, whose greed has gotten the better of them again and again. They could use work released sufficiently long ago to be legally available, or they could take work released as creative commons, or they could run a lookup, to make sure to never output verbatim copies of input or outputs, that are within a certain string editing distance, depending on output length, or they could have paid people to reach out to all the people, whose work they are infringing upon. But they didn't do any of that, of course, because they think they are above the law.

nadermx|9 months ago

I'm confused, so you're saying its illegal? Because last I checked it's still in the process of going through the courts. And need we forget that copyright's purpose is to advance the arts and sciences. Fair use is codified into law, which states each case is seen on a use by use basis, hence the litigation to determine if it is in fact, legal.

ashoeafoot|9 months ago

Obviously a revenue tracking weight should be trained in allowing the tracking and collection of all values generated from derivative works.

hochstenbach|9 months ago

Humans are not allowed to do what AI firms want to do. That was one of the copyright office arguments: a student can't just walk into a library and say "I want a copy of all your books, because I need them for learning".

Humans are also very useful and transformative.

timdiggerm|9 months ago

Or we could acknowledge that something could be a bad idea, despite its utility

ceejayoz|9 months ago

> Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

You're still not gonna be allowed to commercially publish "Hairy Plotter and the Philosophizer's Rock".

WesolyKubeczek|9 months ago

No, but you are most likely allowed to commercially publish "Hairy Potter and the Philosophizer's Rock", a story about a prehistoric community. The hero is literally a hairy potter who steals a rock from a lazy deadbeat dude who is pestering the rest of the group with his weird ideas.

anigbrowl|9 months ago

You are if it's parody, cf 'Bored of the Rings'.

ActionHank|9 months ago

Assuming this means copyright is dead, companies will be vary upset and patents will likely follow.

The hold US companies have on the world will be dead too.

I also suspect that media piracy will be labelled as the only reason we need copyright, an existing agency will be bolstered to address this concern and then twisted into a censorship bureau.

regularjack|9 months ago

Then they need to be changed for everyone and not just AI companies, but we all know that ain't happening.

dns_snek|9 months ago

The problem with this kind of analysis is that it doesn't even try to address the reasons why copyright exists in the first place. This belief that training LLMs on content without permission should be allowed is incompatible with the belief that copyright is useful, you really have to pick a lane here.

Go back to the roots of copyright and the answers should be obvious. According to the US constitution, copyright exists "To promote the Progress of Science and useful Arts" and according to the EU, "Copyright ensures that authors, composers, artists, film makers and other creators receive recognition, payment and protection for their works. It rewards creativity and stimulates investment in the creative sector."

If I publish a book and tech companies are allowed to copy it, use it for "training", and later regurgitate the knowledge contained within to their customers then those people have no reason to buy my book. It is a market substitute even though it might not be considered such under our current copyright law. If that is allowed to happen then investment will stop and these books simply won't get written anymore.

p0w3n3d|9 months ago

it's funny how a law becomes potentially-outdated only when big corporations want to violate in on a global scale.

As a private person I no longer feel incentivised to create new content online because I think that all I create will eventually be stolen from me...

franczesko|9 months ago

> Piracy refers to the illegal act of copying, distributing, or using copyrighted material without authorization. It can occur in various forms

Professing of IP without a license AND offering it as a model for money doesn't seem like an unknown use-case to me

SilasX|9 months ago

>My opinion: I think using copyrighted data to train models for sure seems classically illegal. Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against. When I look at the litany of derivative fantasy novels, it's obvious they're not all fully independent works.

Huh? If you agree that "learning from copyrighted works to make new ones" has traditionally not been considered infringement, then can you elaborate on why you think it fundamentally changes when you do it with bots? That would, if anything, seem to be a reversal of classic copyright jurisprudence. Up until 2022, pretty much everyone agreed that "learning from copyrighted works to make new ones" is exactly how it's supposed to work, and would be horrified at the idea of having to separately license that.

Sure, some fundamental dynamic might change when you do it with bots, but you need to make that case in an enforceable, operationalized way.

bitfilped|9 months ago

Sorry but AI isn't that useful and I don't see it becoming any more useful in the near term. It's taken since ~1950 to get LLMs working well enough to become popular and they still don't work well.

palmotea|9 months ago

[deleted]

ulbu|9 months ago

these comparisons of llms with human artists copying are just ridiculous. it’s saying “well humans are allowed to break twigs and damage the planet in various ways, so why not allow building a fucking DEATH STAR”.

abstracting llms from their operators and owners and possible (and probable) ends and the territories they trample upon is nothing short of eye-popping to me. how utterly negligent and disrespectful of fellow people must one be at the heart to give any credence to such arguments

jobigoud|9 months ago

We are talking about the rights of the humans training the models and the humans using the models to create new things.

Copyright only comes into play on publication. It's only concerned about publication of the models and publication of works. The machine itself doesn't have agency to publish anything at this point.

Suppafly|9 months ago

>The fatal flaw in your reasoning: machines aren't humans.

I don't see how that affects the argument. The machines are being used by humans. Your argument then boils down to the idea that you can do something manually but it becomes illegal if you use a tool to do it efficiently.

gruez|9 months ago

>The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

That might be true but I don't see how it's relevant. There's no provision in copyright law that gives a free pass to humans vs machines, or makes a distinction between them.

Intralexical|9 months ago

> The fatal flaw in your reasoning: machines aren't humans. You can't reason that a machine has rights from the fact a human has them. Otherwise it's murder to recycle a car.

The direction we're going, it seems more likely it'll be recycling to murder a human.

jeroenhd|9 months ago

Pirating movies is also useful, because I can watch movies without paying on devices that apps and accounts don't work on.

That doesn't make piracy legal, even though I get a lot of use out of it.

Also, a person isn't a computer so the "but I can read a book and get inspired" argument is complete nonsense.

Workaccount2|9 months ago

It's only complete non-sense if you understand how humans learn. Which we don't.

What we do know though is that LLMs, similar to humans, do not directly copy information into their "storage". LLMs, like humans, are pretty lossy with their recall.

Compare this to something like a search indexed database, where the recall of information given to it is perfect.

datavirtue|9 months ago

And everyone here is downloading every show and movie in existence without even a hint of guilt.

apercu|9 months ago

>Despite that, Humans can read a book, get inspiration, and write a new book and not be litigated against.

Corporations are not humans. (It's ridiculous that they have some legal protections in the US like humans, but that's a different issue). AI is also not human. AI is also not a chipmunk.

Why the comparison?

stevenAthompson|9 months ago

Doing a cover song requires permission, and doing it without that permission can be illegal. Being inspired by a song to write your own is very legal.

AI is fine as long as the work it generates is substantially new and transformative. If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem.

Yes, I'm aware that machines aren't people and can't be "inspired", but if the functional results are the same the law should be the same. Vaguely defined ideas like your soul or "inspiration" aren't real. The output is real, measurable, and quantifiable and that's how it should be judged.

mjburgess|9 months ago

I fear the lack of our ability to measure your mind might render you without many of the legal or moral protections you imagine you have. But go ahead, tare down the law to whatever inanity can be described by the trivial machines of the world's current popular charlatans. Presumably you weren't using society's presumption of your agency anyway.

toast0|9 months ago

> Doing a cover song requires permission, and doing it without that permission can be illegal.

I believe cover song licensing is available mechanically; you don't need permission, you just need to follow the procedures including sending the licensing fees to a rights clearing house. Music has a lot of mechanical licenses and clearing houses, as opposed to other categories of works.

datavirtue|9 months ago

"If it breaks and starts spitting out other peoples work verbatim (or nearly verbatim) there is a problem."

Why is that? Seems all logic gets thrown out the window when invoking AI around here. References are given. If the user publishes the output without attribution, NOW you have a problem. People are being so rabid and unreasonable here. Totally bat shit.

vessenes|9 months ago

Thank you - a voice of sanity on this important topic.

I understand people who create IP of any sort being upset that software might be able to recreate their IP or stuff adjacent to it without permission. It could be upsetting. But I don't understand how people jump to "Copyright Violation" for the fact of reading. Or even downloading in bulk. The Copyright controls, and has always controlled, creation and distribution of a work. In the nature even of the notice is embedded the concept that the work will be read.

Reading and summarizing have only ever been controlled in western countries via State's secrets type acts, or alternately, non-disclosure agreements between parties. It's just way, way past reality to claim that we have existing laws to cover AI training ingesting information. Not only do we not, such rules would seem insane if you substitute the word human for "AI" in most of these conversations.

"People should not be allowed to read the book I distributed online if I don't want them to."

"People should not be allowed to write Harry Potter fanfic in my writing style."

"People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics, the societal tradeoffs we've made so far, and is able to discuss where we might want to go, and what would be best.

datavirtue|9 months ago

Exactly, it is an immense privilege to have your works preserved and promulgated through the ages for instant recall and automated publishing. It's literally what everyone wants. The creators and the consumers. The AI companies are not robbing your money or IP. Period.

caconym_|9 months ago

If it was as obvious as you claim, the legal issues would already be settled, and your characterization of what LLMs are doing as "reading and summarizing" is hilariously disingenuous and ignores essentially the entire substance of the debate (which is happening not just on internet forums but in real courts, where real legal professionals and scholars are grappling with how to fit AI into our framework of existing copyright law, e.g.^[1]).

Of course, if you start your thought by dismissing anybody who doesn't share your position as not sane, it's easy to see how you could fail to capture any of that.

^[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-a...

jasonlotito|9 months ago

> But I don't understand how people jump to "Copyright Violation" for the fact of reading.

The article specificaly talks about the creation and distribution of a work. Creation and distribution of a work alone is not a copyright violation. However, if you take in input from something you don't own, and genAI outputs something, it could be considered a copyright violation.

Let's make this clear; genAI is not a copyright issue by itself. However, gen AI becomes an issue when you are using as your source stuff you don't have the copyright or license to. So context here is important. If you see people jumping to copyright violation, it's not out of reading alone.

> "People should not be allowed to read the book I distributed online if I don't want them to."

This is already done. It's been done for decades. See any case where content is locked behind an account. Only select people can view the content. The license to use the site limits who or what can use things.

So it's odd you would use "insane" to describe this.

> "People should not be allowed to write Harry Potter fanfic in my writing style."

Yeah, fan fiction is generally not legal. However, there are some cases where fair use covers it. Most cases of fan fiction are allowed because the author allows it. But no, generally, fan fiction is illegal. This is well known in the fan fiction community. Obviously, if you don't distribute it, that's fine. But we aren't talking about non-distribution cases here.

> "People should not be allowed to get formal art training that involves going to museums and painting copies of famous paintings."

Same with fan fiction. If you replicate a copyrighted piece of art, if you distribute it, that's illegal. If you simply do it for practice, that's fine. But no, if you go around replicating a painting and distribute it, that's illegal.

Of course, technically speaking, none of this is what gen AI models are doing.

> We just will not get to a sensible societal place if the dialogue around these issues has such a low bar for understanding the mechanics

I agree. Personifying gen AI is useless. We should stick to the technical aspects of what it's doing, rather than trying to pretend it's doing human things when it's 100% not doing that in any capacity. I mean, that's fine for the the layman, but anyone with any ounce of technical skill knows that's not true.