(no title)
natfriedman | 4 years ago
In terms of the permissibility of training on public code, the jurisprudence here – broadly relied upon by the machine learning community – is that training ML models is fair use. We are certain this will be an area of discussion in the US and around the world and we're eager to participate.
SCLeo|4 years ago
To be honest, I doubt that. Maybe I am special, but if I am releasing some code under GPL, I really don't want it to be used in training a closed source model, which will be used in a closed source software generating code for closed source projects.
zarzavat|4 years ago
For example, if I am writing a criticism of an article, I can quote portions of that article in my criticism, or modify images from the article in order to add my own commentary. Fair use protects against authors who try to exert so much control over their works that it harms the public good.
yjftsjthsd-h|4 years ago
manquer|4 years ago
colinbartlett|4 years ago
dragonwriter|4 years ago
Okay, but that's...not much of a counterargument (to be fair, the original claim was unsupported, though.)
> Maybe I am special, but if I am releasing some code under GPL, I really don't want it to be used in training a closed source model
That's really not a counterargument. “Fair use” is an exception to exclusive rights under copyright, and renders the copyright holder’s preferences moot to the extent it applies. The copyright holder not being likely to want it based on the circumstances is an argument against it being implicitly licensed use, but not against it being fair use.
__MatrixMan__|4 years ago
It seems like some of the chatter around this is implying that the resultant code might still have some GPL still on it. But it seems to me that it's the trained model that Microsoft should have to make available on request.
rowanG077|4 years ago
slownews45|4 years ago
npteljes|4 years ago
If you train az ML model on GPL code, and then make it output some code, would that not make the result a derivative of the GPL licensed inputs?
But I guess this could be similar to musical composition. If the output doesn't resemble any of the inputs, or contains significant continous portions of them, then it's not a derivative.
IncRnd|4 years ago
In this particular case, the output resembles the inputs, or there is no reason to use Github Copilot.
jazzyjackson|4 years ago
This just gives me a flashback to copying homework in school, “make sure you change some of the words around so it’s not obvious”
I’m sure you’re right Re: jurisprudence, but it never sat right with me that AI engineers get to produce these big, impressive models but the people who created the training data will never be compensated, let alone asked. So I posted my face on Flickr, how should I know I’m consenting to benefit someone’s killer robot facial recognition?
ramraj07|4 years ago
Hamuko|4 years ago
How does that apply to countries where Fair Use is not a thing? As in, if you train a model on a fair use basis in the US and I start using the model somewhere else?
Asmod4n|4 years ago
KMnO4|4 years ago
sicromoft|4 years ago
jamie_ca|4 years ago
eqtn|4 years ago
CyberRabbi|4 years ago
In what context? You are planning on commercializing Copilot and in that case the calculus on whether or not using copyright protected material for your own benefit changes drastically.
josourcing|4 years ago
----> for purposes such as criticism, news reporting, teaching, and research <----, without the need for permission from or payment to the copyright holder.
Copilot is not criticizing, reporting, teaching, or researching anything. So claiming fair use is the result of total ignorance or disregard.
unknown|4 years ago
[deleted]