top | item 43446912

(no title)

That might indeed apply to open source software.

If we instead adopt the view of free software (https://www.gnu.org/philosophy/open-source-misses-the-point....), the fact that OpenAI and other large corporations train their large-language models behind closed doors - with no disclosure of their training corpus - effectively represents the biggest attack on GPL-licensed code to date.

No evidence suggests that OpenAI and others exclude GPL-licensed repositories from their training sets. And nothing prevents the incorporation of GPL-licensed code into proprietary codebases. Note that a few papers have documented the regurgitation of literal text snippets by large language models (one example: https://arxiv.org/pdf/2409.12367v2).

To me, this seems like the LLM-version of using coin-mixing to obscure the trail of Bitcoin transactions in the blockchain. The current situation also reminds me of how the generalization of the SaaS model led to the creation of the Affero GPL license (https://www.gnu.org/licenses/why-affero-gpl.html).

LLM's enable the circumvention of the spirit of free software licenses, as well as of the legal mechanisms to enforce them.

discuss

pabs3|11 months ago

I absolutely agree with you that the current big LLMs enable an attack on all FOSS licenses and especially copyleft ones. That doesn't mean that one couldn't create LLM code generators in a respectful way. Do license analysis on the input code and then train separate models on the different license buckets, with the outputs from each model considered derivative works of the input corpus.

Also I don't think a restriction on the FSF's freedom 2 "The freedom to study how the program works" based on what tools you use and how you use them fits with FSF philosophy, nor do I think it is appropriate. You should be able to run whatever analysis tools you have available to study the program. Being able to ingest a program into a local LLM model and then ask questions about the codebase before you understand it yourself is valuable. Or aren't a programmer and or aren't familiar with the language used, then a local LLM could help you make the changes needed to add a new feature. In that situation LLMs can enable practical software freedom, for those who can't afford to pay/convince a programmer to make the changes they want.

https://www.gnu.org/philosophy/free-sw.html

In addition, OpenAI clearly do not respect copyrights and licenses in general, so would ignore any anti-AI clauses, which would make them ineffective and thus pointless. So, I think we should tackle the LLM problem through the law, and not through licenses. That is already happening with various caselaw in software, writing, artwork etc.

It isn't possible or practical to change the existing body of Free Software to use new anti-AI clauses anyway.https://juliareda.eu/2021/07/github-copilot-is-not-infringin...

BTW, LLMs could also in theory be used to licensewash proprietary software, see "Does free software benefit from ML models being derived works of training data?" by Matthew Garret:

https://mjg59.dreamwidth.org/57615.html

dithered_djinn|11 months ago

I see what you are saying and don't completely disagree. I however feel that the spirit of free software is to set all software free. From that it follows, that if we are going to follow the current route of complete disregard for authorship and licenses, then the free software movement should continue fighting to liberate all software in existence. In other words, those LLM's that you mention that are to enable software freedom for users who cannot code themselves, in a fair world, they would be trained with both free and proprietary software. After all, a derivative work from a proprietary software should also be subject to fair use. The output produced by the LLM wouldn't necessarily be a literal copy-paste of any particular proprietary software... as the models would just be "learning" from them. The company could just continue doing business as usual, build on their brand and yada yada yada.

Regarding the licensing, I'll restate my point that the Affero license was created precisely in a moment where the existing licenses could no longer uphold the freedoms that the Free Software Foundation set out to defend. A change of license was the right solution at that particular point in time and, if it worked then, I think we can all agree that there is at least a precedent that such a course of action might work and should at the very least be considered as a possible solution for today's problems.

That said, my own personal view is more aligned with demanding the nation states to pressure big corporations so that currently closed-source software becomes at least open-source (either by law, or simply by stopping using it and invest their budget in free alternatives instead). Note I said open source and not free. I just would like to read their code and feed it to my LLM's :)