> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use
> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"
It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.
You skipped quotes about the other important side:
> But Alsup drew a firm line when it came to piracy.
> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."
That is, he ruled that
- buying, physically cutting up, physically digitizing books, and using them for training is fair use
- pirating the books for their digital library is not fair use.
Im not sure how I feel about what anthropic did on merit as a matter of scale, but from a legalistic standpoint how is it different from using the book to train the meat model in my head? I could even learn bits by heart and quote them in context.
> Here is how individuals are treated for massive copyright infringement:
When I clicked the link, I got an article about a business that was selling millions of dollars of pirated software.
This guy made millions of dollars in profit by selling pirated software. This wasn't a case of transformative works, nor of an individual doing something for themselves. He was plainly stealing and reselling something.
Anthropic isn’t selling copies of the material to its users though. I would think you couldn’t lock someone up for reading a book and summarizing or reciting portions of the contents.
Seven years for thumbing your nose at Autodesk when armed robbery would get you less time says some interesting things about the state of legal practice.
What point are you making? 20 years ago, someone sold pirated copies of software (wheres the transformation here) and that's the same as using books in a training set? Judge already said reading isnt infringement.
Apparently it's a common business practice. Spotify (even though I can't find any proof) seems to have build their software and business on pirated music. There is some more in this Article [0].
> Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.
stealing with the intent to gain a unfair marked advantage so that you can effectively kill any ethically legally correctly acting company in a way which is very likely going to hurt many authors through the products you create is far worse then just stealing for personal use
> Stealing is stealing. Let's stop with the double standards.
I get the sentiment, but that statement as is, is absurdly reductive. Details matter. Even if someone takes merchandise from a store without paying, their sentence will vary depending on the details.
oh well, the product has a cute name and will make someone a billionaire, let's just give it the green light. who cares about copyright in the age of AI?
Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books. I'm not saying this is justified, but what would you have done in their situation?
Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?
These are the people shaping the future of AI? What happened to all the ethical values they love to preach about?
We've held China accountable for counterfeiting products for decades and regulated their exports. So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?
If you own a book, it should be legal for your computer to take a picture of it. I honestly feel bad for some of these AI companies because the rules around copyright are changing just to target them. I don't owe copyright to every book I read because I may subconsciously incorporate their ideas into my future work.
Buying, scanning, and discarding was in my proposal to train under copyright restrictions.
You are often allowed to nake a digital copy of a physical work you bought. There are tons of used, physical works thay would be good for training LLM's. They'd also be good for training OCR which could do many things, including improve book scanning for training.
This could be reduced to a single act of book destruction per copyrighted work or made unnecessary if copyright law allowed us to share others' works digitally with their licensed customers. Ex: people who own a physical copy or a license to one. Obviously, the implementation could get complex but we wouldn't have to destroy books very often.
I'm not seeing how this is fair use in either case.
Someone correct me if I am wrong but aren't these works being digitized and transformed in a way to make a profit off of the information that is included in these works?
It would be one thing for an individual to make person use of one or more books, but you got to have some special blindness not to see that a for-profit company's use of this information to improve a for-profit model is clearly going against what copyright stands for.
Every time an article like this surfaces, it always seems like the majority of tech folks believe that training AI on copyrighted material is NOT fair use, but the legal industry disagrees.
Which of the following are true?
(a) the legal industry is susceptible to influence and corruption
(b) engineers don't understand how to legally interpret legal text
(c) AI tech is new, and judges aren't technically qualified to decide these scenarios
Most likely option is C, as we've seen this pattern many times before.
Alsup detailed Anthropic's training process with books: The OpenAI rival
spent "many millions of dollars" buying used print books, which the
company or its vendors then stripped of their bindings, cut the pages,
and scanned into digital files.
I've noticed an increase in used book prices in the recent past and now wonder if there is an LLM effect in the market.
By the way I wonder if recent advancement in protecting Youtube videos from downloaders like yt-d*p are caused by unwillingness to help rival AI companies gather the datasets.
I'm hoping they fail to incentivize using legal, open, and/or licensed data. Then, thry might have to attempt to train a Claude-class model on legal data. Then, I'll have a great, legal model to use. :)
"Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said."
A not-so-subtle difference.
That said, in a sane world, they shouldn't have needed to cut up all those used books yet again when there's obviously already an existing file that does all the work.
If ingesting books into an AI makes Anthropic criminals, then Google et al are also criminals alike for making search indexes of the Internet. Anything published online is equally copyrighted.
Exactly! If Anthropic is guilty of copyright infringement for the mere act of downloading copyrighted books then so is Google, Microsoft (Bing), DuckDuckGo, etc. Every search engine that exists downloads pirated material every day. They'd all be guilty.
Not only that but all of us are guilty too because I'm positive we've all clicked on search results that contained copyrighted content that was copied without permission. You may not have even known it was such.
Remember: Intent is irrelevant when it comes to copyright infringement! It's not that kind of law.
Intent can guide a judge when they determine damages but that's about it.
Yeah, we can all agree that ingesting books is fair use and transformative, but you gotta own what you ingest, you can't just pirate it.
I can read 100 books and write a book based on the inspiration I got from the 100 books without any issue. However, if I pirate the 100 books I've still committed copyright infringement despite my new book being fully legal/fair use.
[+] [-] neonate|8 months ago|reply
[+] [-] dehrmann|8 months ago|reply
> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use
> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"
It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.
[+] [-] 6gvONxR4sf7o|8 months ago|reply
> But Alsup drew a firm line when it came to piracy.
> "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."
That is, he ruled that
- buying, physically cutting up, physically digitizing books, and using them for training is fair use
- pirating the books for their digital library is not fair use.
[+] [-] jpalawaga|8 months ago|reply
[+] [-] MaxPock|8 months ago|reply
[+] [-] alok-g|8 months ago|reply
https://www.courtlistener.com/docket/67569326/598/kadrey-v-m...
Note: I am not a lawyer.
[+] [-] sershe|8 months ago|reply
[+] [-] seuraughty|8 months ago|reply
[+] [-] franczesko|8 months ago|reply
[+] [-] bgwalter|8 months ago|reply
https://investors.autodesk.com/news-releases/news-release-de...
[+] [-] piker|8 months ago|reply
[+] [-] Aurornis|8 months ago|reply
When I clicked the link, I got an article about a business that was selling millions of dollars of pirated software.
This guy made millions of dollars in profit by selling pirated software. This wasn't a case of transformative works, nor of an individual doing something for themselves. He was plainly stealing and reselling something.
[+] [-] JimDabell|8 months ago|reply
This is very different to what Anthropic did. Nobody was buying copies of books from Anthropic instead of the copyright holder.
[+] [-] stocksinsmocks|8 months ago|reply
Seven years for thumbing your nose at Autodesk when armed robbery would get you less time says some interesting things about the state of legal practice.
[+] [-] ysofunny|8 months ago|reply
in other words, provided you have enough spare capital to spin up a corporation, you can break the law!!!!
[+] [-] nh23423fefe|8 months ago|reply
This is reaching at best.
[+] [-] farceSpherule|8 months ago|reply
Come up with a better comparison.
[+] [-] chourobin|8 months ago|reply
[+] [-] marapuru|8 months ago|reply
https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files...
Funky quote:
> Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.
[+] [-] pyman|8 months ago|reply
Stealing is stealing. Let's stop with the double standards.
[+] [-] originalvichy|8 months ago|reply
[+] [-] dathinab|8 months ago|reply
that isn't "just" stealing, it's organized crime
[+] [-] kube-system|8 months ago|reply
I get the sentiment, but that statement as is, is absurdly reductive. Details matter. Even if someone takes merchandise from a store without paying, their sentence will vary depending on the details.
[+] [-] 1970-01-01|8 months ago|reply
[+] [-] x3n0ph3n3|8 months ago|reply
[+] [-] NoMoreNicksLeft|8 months ago|reply
[deleted]
[+] [-] damnesian|8 months ago|reply
[+] [-] Der_Einzige|8 months ago|reply
[+] [-] ramon156|8 months ago|reply
Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?
[+] [-] pyman|8 months ago|reply
We've held China accountable for counterfeiting products for decades and regulated their exports. So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?
[+] [-] guywithahat|8 months ago|reply
[+] [-] hellohihello135|8 months ago|reply
[+] [-] nickpsecurity|8 months ago|reply
You are often allowed to nake a digital copy of a physical work you bought. There are tons of used, physical works thay would be good for training LLM's. They'd also be good for training OCR which could do many things, including improve book scanning for training.
This could be reduced to a single act of book destruction per copyrighted work or made unnecessary if copyright law allowed us to share others' works digitally with their licensed customers. Ex: people who own a physical copy or a license to one. Obviously, the implementation could get complex but we wouldn't have to destroy books very often.
[+] [-] trinsic2|8 months ago|reply
Someone correct me if I am wrong but aren't these works being digitized and transformed in a way to make a profit off of the information that is included in these works?
It would be one thing for an individual to make person use of one or more books, but you got to have some special blindness not to see that a for-profit company's use of this information to improve a for-profit model is clearly going against what copyright stands for.
[+] [-] platunit10|8 months ago|reply
Which of the following are true?
(a) the legal industry is susceptible to influence and corruption
(b) engineers don't understand how to legally interpret legal text
(c) AI tech is new, and judges aren't technically qualified to decide these scenarios
Most likely option is C, as we've seen this pattern many times before.
[+] [-] adolph|8 months ago|reply
[+] [-] tliltocatl|8 months ago|reply
[+] [-] codedokode|8 months ago|reply
[+] [-] 1970-01-01|8 months ago|reply
[+] [-] nickpsecurity|8 months ago|reply
[+] [-] koolala|8 months ago|reply
[+] [-] Kim_Bruning|8 months ago|reply
"Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said."
A not-so-subtle difference.
That said, in a sane world, they shouldn't have needed to cut up all those used books yet again when there's obviously already an existing file that does all the work.
[+] [-] carlosjobim|8 months ago|reply
[+] [-] riskable|8 months ago|reply
Not only that but all of us are guilty too because I'm positive we've all clicked on search results that contained copyrighted content that was copied without permission. You may not have even known it was such.
Remember: Intent is irrelevant when it comes to copyright infringement! It's not that kind of law.
Intent can guide a judge when they determine damages but that's about it.
[+] [-] kristofferR|8 months ago|reply
I can read 100 books and write a book based on the inspiration I got from the 100 books without any issue. However, if I pirate the 100 books I've still committed copyright infringement despite my new book being fully legal/fair use.
[+] [-] m4rtink|8 months ago|reply
Or is it perhaps not an universal cultural/moral aspect ?
I guess for example in Europe people could be more sensitive to it.
[+] [-] lawlessone|8 months ago|reply
Saying that though there are tools for digitizing books that don't require destroying them
[+] [-] stackedinserter|8 months ago|reply
[+] [-] kbelder|8 months ago|reply
Man of Two Worlds by Brian Herbert.
...and I did the world a favor.