(no title)
stefanos82 | 4 years ago
- the generated code by AI belongs to me or GitHub?
- under what license the generated code falls under?
- if generated code becomes the reason for infringment, who gets the blame or legal action?
- how can anyone prove the code was actually generated by Copilot and not the project owner?
- if a project member does not agree with the usage of Copilot, what should we do as a team?
- can Copilot copy code from other projects and use that excerpt code?
- if yes, *WHY* ?!
- who is going to deal with legalese for something he or she was not responsible in the first place?
- what about conflicts of interest?
- can GitHub guarantee that Copilot won't use proprietary code excerpts in FOSS-ed projects that could lead to new "Google vs Oracle" API cases?
natfriedman|4 years ago
On the training question specifically, you can find OpenAI's position, as submitted to the USPTO here: https://www.uspto.gov/sites/default/files/documents/OpenAI_R...
We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!
breck|4 years ago
https://breckyunits.com/the-intellectual-freedom-amendment.h...
Great achievements like this only hammer home the point more about how illogical copyright and patent laws are.
Ideas are always shared creations, by definition. If you have an “original idea”, all you really have is noise! If your idea means anything to anyone, then by definition it is built on other ideas, it is a shared creation.
We need to ditch the term “IP”, it’s a lie.
Hopefully we can do that before it’s too late.
joepie91_|4 years ago
Uh, I very much doubt that. Is there any actual precedent on this?
> We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!
But apparently not eager enough to have this discussion with the community before deciding to train your proprietary for-profit system on billions of lines of code that undoubtedly are not all under CC0 or similar no-attribution-required licenses.
I don't see attribution anywhere. To me, this just looks like yet another case of appropriating the public commons.
king_magic|4 years ago
I for one wouldn't touch this with a 10000' pole until I know the answers to these (very reasonable) questions.
stefano|4 years ago
abn120|4 years ago
So your point (1) is a distraction, and quite an offensive one to thousands of open source developers, who trusted GitHub with their creations.
qihqi|4 years ago
stwrong|4 years ago
croes|4 years ago
stephen82|4 years ago
Another question is this: let's hypothesize I work solo on a project; I have decided to enable Copilot and have reached a 50%-50% development with it after a period of time. One day the "hit by a bus" factor takes place; who owns the project after this incident?
tlamponi|4 years ago
No it really is not that easy, as with compilers it depends on who owned the source and which license(s) they applied on it.
Or would you say I can compile the Linux kernel and the output belongs to me, as compiler operator, and I can do whatever I want with it without worrying about the GPL at all?
user-the-name|4 years ago
So, to be clear, I am allowed to take leaked Windows source code and train an ML model on it?
patrickthebold|4 years ago
dylannorthrup|4 years ago
Looking at the four factors for fair use looks like Copilot will have these issues: - The model developed will be for a proprietary, commercial product - Even if it's a small part of the model, the all training data for that model are fully incorporated into the model - There is a substantial likelihood of money loss ("I can just use Copilot to recreate what a top tier programmer could generate; why should I pay them?")
I have no doubt that Microsoft has enough lawyers to keep any litigation tied up for years, if not decades. But your contention that this is "okay because it's fair use" based on a position paper by an organization supported by your employer... I find that reasoning dubious at best.
deepnash|4 years ago
You can get past GPL, LGPL and other licenses this way. Microsoft can finally copy the linux kernel and get around GPL :-).
unknown|4 years ago
[deleted]
unknown|4 years ago
[deleted]
gpm|4 years ago
Is it even copyrighted? Generally my understand is that to be copyrightable it has to be the output of a human creative process, this doesn't seem to qualify (I am not a lawyer).
See also, monkeys can't hold copyright: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
tlamponi|4 years ago
Isn't it subject to the licenses the model was created from, as the learning is basically just an automated transformation of the code, which would be still the original license - as else I could just run some minifier, or some other, more elaborate, code transformation, on some FOSS project, for example the Linux kernel, and relicense it under whatever?
Does not sound right to me, but IANAL and I also did not really look at how this specific model/s is/are generated.
If I did some AI on existing code I'd be quite cautious and group by compatible licences classes, asking the user what their projects licence is and then only use the compatible parts of the models.-Anything else seems not really ethical and rather uncharted territory in law to me, which may not mean much as IANAL and just some random voice on the internet, but FWIW at least I tried to understand quite a few FOSS licences to decide what I can use in projects and what not.
Anybody knows of some relevant cases of AI and their input data the model was from, ideally in jurisdictions being the US or any European Country ones?
lawtalkinghuman|4 years ago
croes|4 years ago
agilob|4 years ago
https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
natfriedman|4 years ago
viccuad|4 years ago
Read it all, and the questions still stand. Could you, or any on your team, point me on where the questions are answered?
In particular, the FAQ doesn't assure that the "training set from publicly available data" doesn't contain license or patent violations, nor if that code is considered tainted for a particular use.
samtheprogram|4 years ago
> You can use the code anywhere, but you do so at your own risk.
Something more explicit than this would be nice. Is there a specific license?
EDIT: also, there’s multiple sections to a FAQ, notice the drop down... under “Do I need to credit GitHub Copilot for helping me write code?”, the answer is also no.
Until a specific license (or explicit lack there-of) is provided, I can’t use this except to mess around.
dvaun|4 years ago
netcraft|4 years ago
Edit: you have to click the things on the left, I didn't realize they were tabs.
kitsune_|4 years ago
rozab|4 years ago
amelius|4 years ago
gpm|4 years ago
heavyset_go|4 years ago
chuinard|4 years ago
Tainnor|4 years ago
This is not one lone developer with a passion promoting their cool side-project. It's GitHub, which is an established brand and therefore already has a leg up, promoting their new project for active use.
I think in this case, it's very relevant to post these kinds of questions here, since other people will very probably have similar questions.
peddling-brink|4 years ago
The commenter isn't interrogating some indy programmer. This is a product of a subsidiary of Microsoft, who I guarantee has already had a lawyer, or several, consider these questions.
king_magic|4 years ago
ericbarrett|4 years ago