Put 10,000 hours into writing a book. Watch somebody with more resources or media coverage take full credits for it and/or make money instead of yourself.
Copyright is a good thing. Same principle applies to the core of similar laws.
I agree with you. I think that copyright is bad, and patents is also bad.
It is a different issue if they steal your private data or your power (I mean the electrical power for the computers, in case that isn't already clear).
Making copies of published books, music, etc (and doing what you want with them) is not the bad things.
It feels like there are two equally valid sides to this argument that get muddied because of our current model’s/regulations inability to differentiate one over the other.
On the free-information side, I don’t think anyone would argue that AI shouldn’t be allowed to offer a general synopsis of a given book / series. From an author/creator’s POV, it feels like extortion to be able to summarize/recreate any given chapter/subsection to the point that the entire work could be reproduced near-verbatim.
IMO the question is, can we meaningfully draw a line between the two, and if so, how?
I don't think anyone is stopping AI learning on the synopses of books. Or learning on books having paid licensing costs. It's the wanting to have cake, eat cake and for free that is falling.
In contrast to typical corporate crime, it seems there is documentation of upper management signing off on the decision.
Are there other juicy examples where the C-suite can be directly implicated? Always assumed that management knew how to leave instructions vague enough so as to keep their hands clean (a la meddlesome priests). The bad actor was always some middle-manager gone rogue.
I think the main issue is that authors published books with the intention of human not machine consumption. Nobody though to put a contract in a book saying "human consumption only, not to be used to Train AI". Meta pirated the books in question, but what if they had bought a copy. Oddly cracking the encryption, a violation of the DMCA might be the infraction..
The courts have some tough questions to answer here.
If training AI doesn't constitute fair use, you will lose more than you could ever possibly hope to gain. As will the rest of us.
Meanwhile, sublimate your dudgeon towards advocating for free access to the resulting models. That's what's important. Meta is not the company you want to go after here, since they released the resulting model weights.
Unauthorized copying (aka pirating) is definitely a copyright violation.
That appears to be a huge problem with the large models and training. They don't secure legal access to the materials they train on, and thus fail to compensate authors for their work.
AKA students are required to buy or otherwise obtain legal access to their text books(like checking the book out of the library).
Training AI should play the same rules humans students have to follow.
Does fair use imply that pirating copyrighted material is ok?
I mean, it’s a serious question; I don’t see this as really connected.
As long as an AI can “understand” the content of a book and spit out a summary of it, or even leverage what it learned to perform further inference, I’d be inclined to say that this is fair use; a human would do the same.
But this has nothing to do with using pirated material for training, especially for some kind of commercial purpose (even if llama is free, they’re building on top of it) - I don’t see why it should be legal.
Why should it be fair use? Why would being a derivative work not be OK? There is a massive corpus of public domain and FOSS works. Likewise plenty of permissively licensed government created datasets. There is no reason why any corpus created from these sources is insufficient.
The illustration shows a page from Matter by Iain M. Banks. I don't suppose that's an IP violation, but it implies a human artist with attention to detail.
Mind you, it's page 1 and the book is not on page 1.
Speaking broadly, the publishers who hold the copyrights on these materials have often behaved poorly. From overbroad DMCA takedown demands that chill fair use, to threatening libraries, students, and scholars with lawsuits and stiff penalties for minor infringements, to "copyright trolling" campaigns sending mass settlement demands to alleged infringers -- I have little sympathy for copyright holders.
I'm still angry about how publishers and the Authors Guild sued Google over Google Books. Intellectual property is why we as a society can't have nice things. While I'm not a fan of Meta, their open weight models are probably one of the best things they've ever done, and I'll back big tech over publishers every time.
The idea that you can’t train on copyrighted materials is ludicrous, imho. So apparently you don’t want humanity and the future of intelligence to benefit from your work? You just want it to keep it locked up in some archive that virtually no one ever reads?
Might as well say the people who read your books aren’t allowed to teach the concepts or theories. Completely asinine argument. If you don’t want the knowledge to proliferate, then don’t publish. They’re not copying and redistributing.
Meanwhile, jurisdictions outside of us copyright protection will leapfrog us because we can’t get out of our own way.
I’m not sure that’s the whole picture. Followed to the logical conclusion, everyone should have the right to pirate whatever books they want and then feed them into a local LLM. Which leads to less kickback to the author, which means they can’t sustainably write, and we end up in a worse off situation.
> The idea that you can’t train on copyrighted materials is ludicrous, imho.
Let us for a minute accept that it is ok to train on copyrighted materials. I don't believe that but I'll humor you. So let's accept it.
To train on copyrighted materials, they need to purchase the copyrighted materials, correct? If you wanted to train a model on all O'reilly books, you'd purchase the O'reilly books first, wouldn't you?
Do you think it is ok to make illegal pirated copies of the book to do your training?
"Mega can regurgitate virtually any excerpt from any of my books, there for they have stolen them"
versus what is not interesting such as
"my books are in libgen therefore they stole my work, even though I can't find direct evidence of the theft"
>The most damning thing? It appears that Meta knew exactly what they are doing, and chose to proceed anyway.
that is not the most damning thing. It might trigger worse damages or elevation of the severity of an infraction, but it is not evidence of guilt per se, which is what I would call "damning"
I guess Zuck's new found interest in manly combat sports is based on his expectation of seeing PDiddy, SBF, and Luigi in club fed over his l33+ pir8cy. It all makes sense now.
dfedbeef|11 months ago
thedevilslawyer|11 months ago
puppycodes|11 months ago
But also no one is selling "your book", the product is completely different in literally every conceivable way.
you have never (and no one ever should) own words arranged in a certain way. You own the right to sell a book. Not the words themselves.
meta does bad things and im not a fan, but this really pales in comparison.
bernb|11 months ago
zzo38computer|11 months ago
It is a different issue if they steal your private data or your power (I mean the electrical power for the computers, in case that isn't already clear).
Making copies of published books, music, etc (and doing what you want with them) is not the bad things.
card_zero|11 months ago
I wonder if an equivalent to Performance Rights Organizations will emerge as a channel for LLM publishers (so to speak) to pay fees.
spatialspice|11 months ago
On the free-information side, I don’t think anyone would argue that AI shouldn’t be allowed to offer a general synopsis of a given book / series. From an author/creator’s POV, it feels like extortion to be able to summarize/recreate any given chapter/subsection to the point that the entire work could be reproduced near-verbatim.
IMO the question is, can we meaningfully draw a line between the two, and if so, how?
rich_sasha|11 months ago
3eb7988a1663|11 months ago
Are there other juicy examples where the C-suite can be directly implicated? Always assumed that management knew how to leave instructions vague enough so as to keep their hands clean (a la meddlesome priests). The bad actor was always some middle-manager gone rogue.
acomjean|11 months ago
The courts have some tough questions to answer here.
CamperBob2|11 months ago
If training AI doesn't constitute fair use, you will lose more than you could ever possibly hope to gain. As will the rest of us.
Meanwhile, sublimate your dudgeon towards advocating for free access to the resulting models. That's what's important. Meta is not the company you want to go after here, since they released the resulting model weights.
ebiederm|11 months ago
Unauthorized copying (aka pirating) is definitely a copyright violation.
That appears to be a huge problem with the large models and training. They don't secure legal access to the materials they train on, and thus fail to compensate authors for their work.
AKA students are required to buy or otherwise obtain legal access to their text books(like checking the book out of the library).
Training AI should play the same rules humans students have to follow.
verzali|11 months ago
alanfranz|11 months ago
I mean, it’s a serious question; I don’t see this as really connected.
As long as an AI can “understand” the content of a book and spit out a summary of it, or even leverage what it learned to perform further inference, I’d be inclined to say that this is fair use; a human would do the same.
But this has nothing to do with using pirated material for training, especially for some kind of commercial purpose (even if llama is free, they’re building on top of it) - I don’t see why it should be legal.
4agdsF|11 months ago
[deleted]
uezajh|11 months ago
[deleted]
OneDeuxTriSeiGo|11 months ago
card_zero|11 months ago
Mind you, it's page 1 and the book is not on page 1.
unknown|11 months ago
[deleted]
unknown|11 months ago
[deleted]
spudlyo|11 months ago
I'm still angry about how publishers and the Authors Guild sued Google over Google Books. Intellectual property is why we as a society can't have nice things. While I'm not a fan of Meta, their open weight models are probably one of the best things they've ever done, and I'll back big tech over publishers every time.
moscoe|11 months ago
Might as well say the people who read your books aren’t allowed to teach the concepts or theories. Completely asinine argument. If you don’t want the knowledge to proliferate, then don’t publish. They’re not copying and redistributing.
Meanwhile, jurisdictions outside of us copyright protection will leapfrog us because we can’t get out of our own way.
sepositus|11 months ago
throwaway150|11 months ago
Let us for a minute accept that it is ok to train on copyrighted materials. I don't believe that but I'll humor you. So let's accept it.
To train on copyrighted materials, they need to purchase the copyrighted materials, correct? If you wanted to train a model on all O'reilly books, you'd purchase the O'reilly books first, wouldn't you?
Do you think it is ok to make illegal pirated copies of the book to do your training?
entangledqubit|11 months ago
uezajh|11 months ago
[deleted]
4agdsF|11 months ago
[deleted]
unknown|11 months ago
[deleted]
fsckboy|11 months ago
"Mega can regurgitate virtually any excerpt from any of my books, there for they have stolen them"
versus what is not interesting such as
"my books are in libgen therefore they stole my work, even though I can't find direct evidence of the theft"
>The most damning thing? It appears that Meta knew exactly what they are doing, and chose to proceed anyway.
that is not the most damning thing. It might trigger worse damages or elevation of the severity of an infraction, but it is not evidence of guilt per se, which is what I would call "damning"
CaffeineLD50|11 months ago
#freezuck
CaffeineLD50|11 months ago
[deleted]