top | item 27718325

(no title)

throw_2021-07 | 4 years ago

Stack Overflow and Copilot are similar. Usage of both routinely violates licenses. Stack Overflow content is licensed under CC-BY-SA. Terms [1]:

* Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

* ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

In over a decade of software engineering, I've seen many reuses of Stack Overflow content, occasionally with links to underlying answers. All Stack Overflow content use I've seen would clearly fail the legal terms set out by the license.

I suspect Copilot usage will similarly fail a stringent interpretation of underlying licenses, and will similarly face essentially no enforcement.

[1] https://creativecommons.org/licenses/by-sa/4.0/

discuss

order

lumost|4 years ago

The difference here is that it's hard to sue a company for sporadic, difficult to track down usages of SO content written by their own engineers.

One can now trivially coerce copilot to regurgitate copyrighted content without attribution. Copilot's basic premise violates the CC-BY-SA terms, and this will continue until no party can demonstrate a viable method of extracting copyrighted code.

There is now a single party backed by a company with a 2 Trillion dollar market cap that can be sued for flagrant copyright violations.

moyix|4 years ago

Surely you would have to sue the people using the tool to produce verbatim copies of code, not the creator of the tool?

throw_2021-07|4 years ago

Let's differentiate legal risk by the party it affects:

* Companies with engineers using Copilot. Risk here is negligible, like that of copying Stack Overflow answers, or any code that isn't under a truly permissive license like CC0 [1]. Prohibiting use of Copilot in a company based on this risk has no merit.

* GitHub and Microsoft. Risk for them is higher yet worthwhile. Copilot is more like Stack Overflow than Napster. Affected copyright holders added their works to GitHub and agreed to their terms, so GitHub has a legal basis to show that content in Copilot. In terms of facilitating copyright infringement, far more violations occur by engineers manually searching and copying code on GitHub; lawsuits against GitHub due to that would be dismissed. Determining provenance is slightly harder in Copilot than in search, but GitHub could minimize risk to itself by noting in Copilot terms that users must review Copilot's suggestions for underlying license concerns. Engineers rarely will -- they routinely violate licenses of Stack Overflow and code copied from elsewhere -- but that shifts responsibility from GitHub, and legal risk to companies using Copilot remains negligible.

[1] https://creativecommons.org/share-your-work/public-domain/cc...