(no title)
skilled | 1 month ago
Does this even make sense? Are the copyright laws so bad that a statement like this would actually be in NVIDIA’s favor?
skilled | 1 month ago
Does this even make sense? Are the copyright laws so bad that a statement like this would actually be in NVIDIA’s favor?
ThrowawayR2|1 month ago
jkaplowitz|1 month ago
Quoting the text which the FSF put at the top of that page:
"This paper is published as part of our call for community whitepapers on Copilot. The papers contain opinions with which the FSF may or may not agree, and any views expressed by the authors do not necessarily represent the Free Software Foundation. They were selected because we thought they advanced the discussion of important questions, and did so clearly."
So, they asked the community to share thoughts on this topic, and they're publishing interesting viewpoints that clearly advance the discussion, whether or not they end up agreeing with them. I do acknowledge that they paid $500 for each paper they published, which gives some validity to your use of the verb "commissioned", but that's a separate question from whether the FSF agrees with the conclusions. They certainly didn't choose a specific author or set of authors to write a paper on a specific topic before the paper was written, which a commission usually involves, and even then the commissioning organization doesn't always agree with the paper's conclusion unless the commission isn't considered done until the paper is updated to match the desired conclusion.
> You will notice that the FSF has not rushed out to file copyright infringement suits even though they probably have more reason to oppose LLMs trained on FOSS code than anyone else in the world.
This would be consistent with them agreeing with this paper's conclusion, sure. But that's not the only possibility it's consistent with.
It could alternatively be because they discovered or reasonably should have discovered the copyright infringement less than three years ago, therefore still have time remaining in their statute of limitations, and are taking their time to make sure they file the best possible legal complaint in the most favorable available venue.
Or it could simply be because they don't think they can afford the legal and PR fight that would likely result.
grayhatter|1 month ago
I agree with jkaplowitz, but for a different reason I still believe that your description feels a bit misleading to me. The FSF commissioned paper makes the argument that Microsoft's use of code FROM GITHUB, FOR COPILOT is likely non-infringing, because of the additional github ToS. This feels like critical context to provide given in the very next statement, you widened it to LLMs generally, and the FSF which likely cares about code, not on github as well.
All of that said, I'm not sure it matters, because while I don't find the argument from the that whitepaper very compelling, because it's based critically on additional grants in the ToS. IIRC (going only from memory) the ToS requires that you grant github a license as it's needed to provide the service. Github can provide the services the user reasonably understood github to provide, without violating the additional clauses specified in the existing FOSS license covering the code. That being from a while ago, and I'd say it's very murky now, because everyone knows Microsoft provides copilot, so "obviously" they need it.
Unfortunately, and importantly, when dealing with copyrights, the paper also covers the transformative fair use arguments in depth. And I do find those following arguments very compelling. The paper, (and likely others) are making the argument that the code output from an LLM is likely transformative. And thus can't be infringing compelling, (or is unlikely to be). I think in many cases, the output is clearly transformative in nature.
I've also seen code generated by claude (likely others as well?) to copy large sections from existing works. Where it's clearly "copy/paste" which clearly can't be fair use, nor transformative. The output clearly copies the soul of the work. Thus given I have no idea what dataset they're copying this code from, it's scary enough to make me unwilling to take the chance on any of it.
reorder9695|1 month ago
general1465|1 month ago
earthnail|1 month ago
gruez|1 month ago
You're probably being sarcastic but that's actually how the law works. You'll note that when people get sued for "pirating" movies, it's almost always because they were caught seeding a torrent, not for the act of watching an illegal copy. Movie studios don't go after visitors of illegal streaming sites, for instance.
thaumasiotes|1 month ago
ErroneousBosh|1 month ago
No, I acquired a block of high-entropy random numbers as a standard reference sample.
JKCalhoun|1 month ago
Ferret7446|1 month ago
NitpickLawyer|1 month ago
It makes some sense, yeah. There's also precedent, in google scanning massive amounts of books, but not reproducing them. Most of our current copyright laws deal with reproductions. That's a no-no. It gets murky on the rest. Nvda's argument here is that they're not reproducing the works, they're not providing the works for other people, they're "scanning the books and computing some statistics over the entire set". Kinda similar to Google. Kinda not.
I don't see how they get around "procuring them" from 3rd party dubious sources, but oh well. The only certain thing is that our current laws didn't cover this, and probably now it's too late.
unknown|1 month ago
[deleted]
musicale|1 month ago
Except that Google acquired the books legally, and first sale doctrine applies to physical books.
> but not reproducing them
See also: "Extracting books from production language models"
https://news.ycombinator.com/item?id=46569799
olejorgenb|1 month ago
Yeah, isn't this what Anthropic was found guilty off?
bulbar|1 month ago
The whole/main intention of an LLM is to reproduce knowledge.
masfuerte|1 month ago
As a consumer you are unlikely to be targeted for such "end-user" infringement, but that doesn't mean it's not infringement.
threethirtytwo|1 month ago
Our copyright laws are nowhere near detailed enough to specify anything in detail here so there is indeed a logical and technical inconsistency here.
I can definitely see these laws evolving into things that are human centric. It’s permissible for a human to do something but not for an AI.
What is consistent is that obtaining the books was probably illegal, but say if nvidia bought one kindle copy of each book from Amazon and scraped everything for training then that falls into the grey zone.
ckastner|1 month ago
Perhaps, but reproducing the book from this memory could very well be illegal.
And these models are all about production.
lelanthran|1 month ago
A type of wishful thinking fallacy.
In law scale matters. It's legal for you to possess a single joint. It's not legal to possess 400 tons of weed in a warehouse.
kalap_ur|1 month ago
_trampeltier|1 month ago
Nursie|1 month ago
godelski|1 month ago
Bombthecat|1 month ago
Everything else will be slurped up for and with AI and be reused.
nancyminusone|1 month ago
qingcharles|1 month ago
HillRat|1 month ago
HWR_14|1 month ago
tobwen|1 month ago
bulbar|1 month ago
lencastre|1 month ago
RGamma|1 month ago
postexitus|1 month ago
Elfener|1 month ago
(The difference, is that the first use allows ordinary poeple to get smarter, while the second use allows rich people to get (seemingly) richer, a much more important thing)