(no title)
sj4nz | 3 years ago
These are not quotations from other people's code but something about the deep structures of language and programming language semantics. However, I suspect if you knew enough of a snippet from other source you could coax Co-pilot to suggest code learned from that source, but it would likely be washed over by other code in the corpus where it coincided with meanings.
jacoblambda|3 years ago
The main issue is that while you can use copilot to create "new"/transformative code, it's also trivial to get it to pump out licensed works in a form where you could claim "I didn't know it was taken from x project with y license because the tool made it for me".
I personally have no problem with copilot in concept however to do it (or any other AI model based text/graphics tool) without infringing on people's copyrights is practically an unsolved problem (excluding just per-licensing the training data ahead of time).
withinboredom|3 years ago
I really think we are entering some interesting territory that will likely be an interesting can of worms.
heavyset_go|3 years ago
Software is unique in that there is a cultural trend to share source code, so that makes it easy to compile into "free" datasets.
I wouldn't say it's an unsolved problem, it's just that there are no incentives to compile or pay for datasets when Microsoft already has petabyes of code to train on. If anything, I expect Microsoft to sell datasets based on GitHub repositories if Copilot-like models survive this lawsuit and are conmoditized.