(no title)
dithered_djinn | 11 months ago
If we instead adopt the view of free software (https://www.gnu.org/philosophy/open-source-misses-the-point....), the fact that OpenAI and other large corporations train their large-language models behind closed doors - with no disclosure of their training corpus - effectively represents the biggest attack on GPL-licensed code to date.
No evidence suggests that OpenAI and others exclude GPL-licensed repositories from their training sets. And nothing prevents the incorporation of GPL-licensed code into proprietary codebases. Note that a few papers have documented the regurgitation of literal text snippets by large language models (one example: https://arxiv.org/pdf/2409.12367v2).
To me, this seems like the LLM-version of using coin-mixing to obscure the trail of Bitcoin transactions in the blockchain. The current situation also reminds me of how the generalization of the SaaS model led to the creation of the Affero GPL license (https://www.gnu.org/licenses/why-affero-gpl.html).
LLM's enable the circumvention of the spirit of free software licenses, as well as of the legal mechanisms to enforce them.
pabs3|11 months ago
Also I don't think a restriction on the FSF's freedom 2 "The freedom to study how the program works" based on what tools you use and how you use them fits with FSF philosophy, nor do I think it is appropriate. You should be able to run whatever analysis tools you have available to study the program. Being able to ingest a program into a local LLM model and then ask questions about the codebase before you understand it yourself is valuable. Or aren't a programmer and or aren't familiar with the language used, then a local LLM could help you make the changes needed to add a new feature. In that situation LLMs can enable practical software freedom, for those who can't afford to pay/convince a programmer to make the changes they want.
https://www.gnu.org/philosophy/free-sw.html
In addition, OpenAI clearly do not respect copyrights and licenses in general, so would ignore any anti-AI clauses, which would make them ineffective and thus pointless. So, I think we should tackle the LLM problem through the law, and not through licenses. That is already happening with various caselaw in software, writing, artwork etc.
It isn't possible or practical to change the existing body of Free Software to use new anti-AI clauses anyway.https://juliareda.eu/2021/07/github-copilot-is-not-infringin...
BTW, LLMs could also in theory be used to licensewash proprietary software, see "Does free software benefit from ML models being derived works of training data?" by Matthew Garret:
https://mjg59.dreamwidth.org/57615.html
dithered_djinn|11 months ago
Regarding the licensing, I'll restate my point that the Affero license was created precisely in a moment where the existing licenses could no longer uphold the freedoms that the Free Software Foundation set out to defend. A change of license was the right solution at that particular point in time and, if it worked then, I think we can all agree that there is at least a precedent that such a course of action might work and should at the very least be considered as a possible solution for today's problems.
That said, my own personal view is more aligned with demanding the nation states to pressure big corporations so that currently closed-source software becomes at least open-source (either by law, or simply by stopping using it and invest their budget in free alternatives instead). Note I said open source and not free. I just would like to read their code and feed it to my LLM's :)