(no title)
jakkos | 1 month ago
I feel like this wording isn't great when there are many impactful open source programmers who have explicitly stated that they don't want their code used to train these models and licensed their work in a world where LLMs didn't exist. It wasn't their "gift", it was unwillingly taken from them.
> I'm a programmer, and I use automatic programming. The code I generate in this way is mine. My code, my output, my production. I, and you, can be proud.
I've seen LLMs generate code that I have immediately recognized as being copied a from a book or technical blog post I've read before (e.g. exact same semantics, very similar comment structure and variable names). Even if not legally required, crediting where you got ideas and code from is the least you can do. While LLMs just launder code as completely your own.
p-e-w|1 month ago
That’s been the fate of many creators since the dawn of time. Kafka explicitly stated that he wanted his works to be burned after his death. So when you’re reading about Gregor’s awkward interactions with his sister, you’re literally consuming the private thoughts of a stranger who stated plainly that he didn’t want them shared with anyone.
Yet people still talk about Kafka’s “contribution to literature” as if it were otherwise, with most never even bothering to ask themselves whether they should be reading that stuff at all.
projektfu|29 days ago
But it's true much of his work was unpublished when he died and was "rescued" or "stolen", depending on what narrative you prefer.
yuvadam|1 month ago
antirez|1 month ago
jakkos|1 month ago
LLMs are doing this on an industrial scale.
heavyset_go|1 month ago
Imustaskforhelp|1 month ago
Yes but you can also ask the developer (wheter in libera.irc, or say if its a foss project on any foss talk, about which books and blogs they followed for code patterns & inspirations & just talk to them)
I do feel like some aspects of this are gonna get eaten away by the black box if we do spec-development imo.
jll29|1 month ago
There are subtle legal differences between "free open source" licensing and putting things in the public domain. If you use an open source license, you could forbid LLM training (in licensing law, contrary to all other areas of law, anything that is not granted to licensees is forbidden). Then you can take the big guys (MSFT, Meta, OpenAI, Google) to court if you can demonstrate they violated your terms.
If you place your software into the public domain, any use is fair, including ways to exploit the code or its derivatives not invented at the time of release.
Curiosly, doesn't the GPL even imply that if you pre-tain an LLM with GPLed code and use it to generate code (Claude Code etc.) that all generated code -- as derived intellectual property that it clearly is -- must also be open sourced as per GPL terms? (It would seem in the spirit of the licensors.) Haven't seen this raised or discussed anywhere yet.
zahlman|1 month ago
Established OSS licenses are all from before anyone imagined that LLMs would come into existence, let alone train on and then generate code. Discrimination on purpose is counter to OSI principles (https://opensource.org/osd):
> 6. No Discrimination Against Fields of Endeavor
> The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.
The GPL argument you describe hinges on making the legal case that LLMs produce "derived works". When the output can't be clearly traced to source input (even the system itself doesn't know how) it becomes rather difficult to argue that in court.
singpolyma3|1 month ago
If the courts decide to apply the law as you assume the AI companies are all dead. But they are all betting that's not going to be the case. And since so much of the industry is taking the bet with them... The courts will take that into account
sneak|1 month ago
You can do whatever you want with a gift. Once you release your code as free software, it is no longer yours. Your opinions about what is done with it are irrelevant.
wernsey|29 days ago
For example: MIT license states has this clause "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."
It stands to reason that if an LLM outputs something based on MIT-licensed code then that output should at least contain that copyright because it's what the original author wished.
And I saw a comment below arguing that knowledge cannot be copyrighted, but the code is an expression of that knowledge and that most certainly can be protected by copyright.
vbezhenar|1 month ago
AnimalMuppet|29 days ago
hjoutfbkfd|1 month ago
jakkos|1 month ago
However, if I was using a more recent/niche/unknown theorem, it would absolutely be considered bad practice not to cite where I got it from.
OJFord|1 month ago
antirez|1 month ago
frizlab|1 month ago
Yes. Exactly. As a developer in that case I feel almost violated in my trust in “the internet.” Well it’s even worse, I did not really trust it, but did not think it could be that bad.
bko|1 month ago
I guess the difference is AI companies bad? This is transformative technology creating trillions in value and democratizing information, all subsidized by VC money. Why would anyone in open source who claims to have noble causes be against this? Because their repo will no longer get stars? Because no one will read their asinine stack overflow answer?
https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....
bitwize|29 days ago