(no title)
lambdaxyzw | 1 year ago
The code is published using some license that allows some use cases and prohibits other. For example GPL is famous for being viral. Using it to teach a LLM that spits "unlicensed" code is basically laundering copyright.
hiatus|1 year ago
mrweasel|1 year ago
I'll very naively assume that Amazon, OpenAI, Google and others check licenses before feeding data to their models. I'll stop assuming that when one of these companies admit that they don't actually care and it's not profitable for them to respect licenses.
carom|1 year ago
slindsey|1 year ago
I can read a book, learn about the concepts, then use or repeat those concepts. The AI can do the same. But is it really "learning"? It may be just spewing out pieces of the content without any understanding. In which case it's a copyright violation, right?
barfbagginus|1 year ago
For an LLM that would include:
1. Training data
2. Training code and metrics
3. Hyperparameter settings
4. Output weights
Anything less is really just misinterpretation of the nature of open source's provision for studying, modifying, and recompiling the LLM
Tldr; these companies MUST make the LLM into AGPL and provide all necessary codes as described above. Companies that refuse this will be raided by open source copyright trolls, if we're lucky and a little mischievous.