top | item 28046928

GitHub Copilot is ‘unacceptable and unjust,’ says Free Software Foundation

252 points| axsharma | 4 years ago |infoworld.com

232 comments

order
[+] bcaa7f3a8bbc|4 years ago|reply
The position of the FSF is severely misrepresented by the title. Open the full article, you'll see that all FSF says is GitHub Copilot is proprietary software and SaaS, and all forms of proprietary software and SaaS are unacceptable and unjust. What about the copyright issue of machine learning, then? FSF says it's a new thing with many open questions, they are not really sure, right now they are calling for whitepapers from the public to hear your comments [0].

I think it's a reasonable position to take. Reducing the scope of fair use to strengthen copyleft is a double-edged sword, as it simultaneously makes copyright laws more restrictive, such a ruling can potentially be used by proprietary software vendors against the FOSS community in various ways. It's an issue that requires careful considerations.

[0] https://www.fsf.org/blogs/licensing/fsf-funded-call-for-whit...

[+] pessimizer|4 years ago|reply
> as it simultaneously makes copyright laws more restrictive, such a ruling can potentially be used by proprietary software vendors against the FOSS community in various ways.

Could it? Copyright law is FOSS's only protection. That's why it's witty - copyright law against copyright. Weakening copyright law in an ad hoc way is absolutely not good for FOSS. It's fine to rewrite copyright in a way that explicitly allows things like Copilot, as long as FOSS gets to copy bits of proprietary code, too.

Otherwise, after some appeals court judgement that the FOSS community failed to participate in (or even worse, subelements participated in on the wrong side) we're going to end up with a copyright practice that looks like the NFL exception in monopoly law.

[+] isoprophlex|4 years ago|reply
Morally, I hope the FSF wins.

Otherwise, I hope copilot makes it big. It'll create a new generation of developers that are dependent on these tools to do their work. Also it'll lower the barrier for non-software engineers to participate in writing code. SO copy pasting on steroids.

The resulting mediocre spaghetti will break at record-breaking rates; cleaning up the mess will be highly lucrative!

[+] BiteCode_dev|4 years ago|reply
Exactly.

As a freelancer, every time a client decides for a cheaper alternative, I make very clear I would be delighted to work with them in the future anyway. It rarely fails, one or more years later, the clients calls me back because their cheap alternative turned out to suck and be expensive eventually. Last month, a client from Luxembourg called after 6 years of total silence. They still had me in their listing. 3 years ago, one called me because 2 years prior, the 50k quote they rejected from me turned into a 400K bill from my competitor, and still no release yet.

My rates have been steadily increasing for years thanks to this. Before, geeks were at a disadvantage because people didn't know better, and teams with a good marketing would destroy us. But now, they have been burned so many times. And it pays because more and more devs coming to the market are becoming dependent on their tooling. Now, more often than not, I work with teams that have been copy/pasting git commands not knowing what they do, that have never, ever looked the source code of their framework or don't know how to use a debugger. The HN bubbles tends to blind us to the reality of the corporate world.

Yesterday I did a deployment, but was not allowed to touch the machine. Instead, they made me call a guy sharing a screen of a Vista machine, while he was sshing prod using cmd.exe, and I had to dictate him the instructions to debug the deployment on their custom linux setup. A near retired sysadmin that couldn't type with 10 fingers, pressing 30 times the up arrow to find a bash command in the history every time. He could click on WinSCP very well though.

This 20 minutes job turned into an afternoon of billing.

Though I suppose that's what I look like as a Python expert to an old timer from the 80s that can code in assembly, debug using strace and understand L1 cache :)

People are scared we are going to get automatized by AI.

I am preparing for the most lucrative decade I ever worked in.

[+] moksly|4 years ago|reply
> The resulting mediocre spaghetti will break at record-breaking rates; cleaning up the mess will be highly lucrative!

Maybe, maybe not though. From the perspective of a non-tech enterprise organisation we’ve moved to more and more standardised software that is “good-enough” to avoid dealing with the delays, going over budget, not quite what we wanted and expensive support of specialised software companies.

Office365 has basically replaced half our software suite, and while we do still by some extensions for them from 3rd party companies, Microsoft is simply getting more and more of our business by simply being good enough at a low enough cost.

I’m not going down on some conspiracy path here by the way. If anything, Microsoft is simply using this project to get free research for their Azure Automation services that are currently taking over all the RPA business from their much more expensive competitors. This needs janitors, but not well paid ones.

[+] CryZe|4 years ago|reply
Funnily enough I just used Copilot to write a reasonably huge PR (it did like 95% of the work), which was indeed mostly a copy paste job (the whole library is a SIMD library with lots of similarities between the different types and operations) and Copilot made zero mistakes when it didn't suggest something completely different whereas the human copy pasted code that was already in there had tons of mistakes that I noticed as I went through the library. So interestingly when it comes to code that is mostly copy paste, but requires some subtle changes here and there (based on the type and operation, ...), Copilot is much better at it than humans.
[+] goodpoint|4 years ago|reply
> The resulting mediocre spaghetti will break at record-breaking rates

It's even worse. Copilot encourages boilerplate and poor abstraction by making it cheaper.

[+] Tenoke|4 years ago|reply
>Otherwise, I hope copilot makes it big. It'll create a new generation of developers that are dependent on these tools to do their work.

There's definitely other scenarios, like my preferred one of Copilot being legal itself but devs being responsible for using code generated from it, same as if they were using a more direct copy-paste or search tool.

[+] hippari|4 years ago|reply
I hope so, the pay rate for cleaning up per line of code better match per line of coke. Otherwise my sanity can't recharge.
[+] jfmc|4 years ago|reply
Copilot is the perfect machine for clean room design and license/copyright laundering. It is unethical and unfair to the open source community.

I do not care if it breaks code to bits and recomposes them again regurgitated by <YOUR-LATEST-AI-TECHNIQUE-HERE> in a way that is untraceable: it would not work without learning from our open source code. Code produced by this method should be automatically licensed under the most restrictive license of its input used for learning.

[+] pydry|4 years ago|reply
I'm kind of wondering if this controversy might not end up being a storm in a teacup.

From what I've seen copilot really lowers the barrier to writing buggy code. If indeed it does turn out to be a tool that lends itself to machine gunning rather than shooting yourself in the foot it almost doesnt matter who owns what IP.

The relentless attempts at developer commodification will, of course, continue, but I can already sense this one ending up like the developer outsourcing craze of the mid-2000s that the Economist also got a little too excited about.

[+] uberswe|4 years ago|reply
Copilot is a fancy autocomplete tool for code. I think the controversy comes from it being trained on public repos without adhering to licensing. I used copilot and thought the best part was when it would autocomplete based on other code I was writing. Sometimes the Copilot would help me see places where I had repetitive code which could be turned into a function.
[+] jeroenhd|4 years ago|reply
I think MS knows damn well that they've forfeited the ethics of their code generation. There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree. They know their model arbitrary copy/pastes other people's code so they train it almost exclusively on other people's code that they don't care for it it gets stolen. Their assumption seems to be "if Bing can find it, it's up for grabs, no matter the license". Good luck getting the same treatment from MS if you upload the leaked XP kernel to github to make your own fork.

I'll accept the ethics of copilot when they add the source code for Windows, Azure and Office to their training set, because only then will MS truly reflect that their model doesn't cross the spirit or even letter of any licensing.

[+] alkonaut|4 years ago|reply
> I think MS knows damn well that they've forfeited the ethics of their code generation. There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree. They know their model arbitrary copy/pastes other people's code

Microsoft can of course create Copilot using the GitHub code. It’s not publishing any derived work on its own - and this type of access to the code is likely a large part of the reason for buying GitHub in the first place.

The only ethical issue for Microsoft here is if Microsoft sells this service (they don’t - yet) and risk including nontrivial code without attribution (seems likely, given the behavior of the preview but if ms for example limits output to a few lines or prevents generating too large chunks verbatim the issue almost disappears).

Ethical/legal issues and risks for users of Copilot are much larger, such as if they use it to conjure up a nontrivial snippet and then not research the origin of it. It’s no better than copying it from the original location.

Microsoft could probably throw in parts of their closed source in copilot - but not even Microsoft controls that. Third parties have copyrights that prevent it too.

But people who keep code in public GitHub repos (I assume) let GitHub do things like train neural nets on it, and Microsoft obviously don’t keep much of the windows or office sources in public GitHub repos.

[+] ThrowawayR2|4 years ago|reply
> "There's a reason they've trained the model on Github repositories instead of, say, the Windows kernel driver tree."

At least part of the reason has to be because only a tiny percentage of developers use C++, particularly the flavor of C++ that Visual Studio speaks, as opposed to Javascript, Python, etc. Moreover, kernel and driver code doesn't resemble boilerplate code used in desktop applications. Is this not obvious to the people who keep repeating this?

[+] visarga|4 years ago|reply
This is such a limited take on Copilot.

1. they can check to see if the generated code is an exact copy of an example in the training set

2. when the code matches, they can discard it, they got many predictions for each prompt anyway.

3. My preferred option - they can display the URL of the source page together with the code, acting like a regular search engine at this point; this also solves the problem of not knowing the copyright status of the code

[+] Applejinx|4 years ago|reply
I'm curious if they took only the whole of Github after acquiring Github, or if they've taken 'all publically visible code everywhere'. Y'know, as an open source dev who's continued to use Github after Microsoft took it over. I'm curious if I walked right into that one or if it would've made no difference whatsoever.
[+] ben-gy|4 years ago|reply
I’ve been using copilot for the past couple to months, and it’s seriously becoming a part of my daily coding workflow.

The majority of suggestions are not quite what I want but then I’ve found the more I comment my code the more personalised the suggestions get and consequently (as a solo founder in my own startup) copilot finishing my code for me during late nights trying to ship features for customers before the following day is something I have become grateful for.

It’s a double edged sword because it’s enabling me to grow my business and remain self employed, but I also understand the concerns and at the end of the day it’s not something I need to do my job (like version control or an IDE for example), but more of a nice to have…

[+] Traubenfuchs|4 years ago|reply
How is it useful to you?

It feels like the majority of my coding consists of translating extremely complex business requirements that neither the business people nor me understand 100% into highly specific code that appears to do what we want it to do. How can Copilot help me here?

[+] xondono|4 years ago|reply
It’s a shame it doesn’t cover any language I use with the technical preview.

Something tells me copilot can be a big win for very verbose languages like C, where most code follows very specific structures.

[+] worldsayshi|4 years ago|reply
What are your thoughts on the IP though? Aren't you worried that the code you're writing ends up being a huge legal ticking bomb?
[+] xaduha|4 years ago|reply
Why do I get a feeling that MS is fine with any turn of events? If some licenses get excluded, then in a way those gain more points in 'pain' according to this https://writing.kemitchell.com/2017/03/29/OSS-Business-Perce.... But what does MS care?
[+] oaiey|4 years ago|reply
MS will just retrain the model on a different input. They could not care less and will actually happy that they get an external statement on the license situation and the ethics.
[+] neximo64|4 years ago|reply
I think its a fantastic tool to use to work on though. I didn't think so seeing the demo i basically brushed it off. But using it is probably one of the most productive things to happen in the past decade

I have my own GPL software out there, most of the time I think it doesn't get really used out there so its not that much of a concern to me, I imagine its like that for other devs too.

I suppose if you're MongoDB (similar to GPL/used to be) or some big company you care more.

[+] dmos62|4 years ago|reply
Could this become something people can't program without? Like imagine being stuck recycling the same programs and paradigms, not being able to move to something new, because Copilot hasn't seen it before.
[+] BenjiWiebe|4 years ago|reply
I'd guess yes, for some people. Others, though, will refuse to use copilot out of sheer obstinacy if nothing else. They will produce the new paradigms for copilot to then consume.
[+] hesk|4 years ago|reply
So, I'm reading the linked article by RMS about Service as a Software Substitute (SaaSS) [1] which is one of the reasons why they object against GitHub Copilot.

The key argument why as SaaSS is ethically wrong is because it denies control over a computation that I could do on my own.

> "The clearest example is a translation service, which translates (say) English text into Spanish text. Translating a text for you is computing that is purely yours. You could do it by running a program on your own computer, if only you had the right program. (To be ethical, that program should be free.) The translation service substitutes for that program, so it is Service as a Software Substitute, or SaaSS. Since it denies you control over your computing, it does you wrong. (emphasis mine)"

I don't find that argument very convincing because it implicitly assumes that there is no alternative translation program that I could run on my own computer.

However, if there is an alternative, then a SaaS offers me choice. I can run a program on my own computer, e.g., if I am concerned about data privacy, or service reliability. The downside is that I have to install and maintain the software on my computer. Or, I could use an external service. The upside is that the barriers of use are minimal.

Of all the articles by RMS I have read so far, I find this one the least convincing.

[1] https://www.gnu.org/philosophy/who-does-that-server-really-s...

[+] BulgarianIdiot|4 years ago|reply
I wonder what their opinion will be if they exclude the GPL family of licenses and includes only permissive ones.
[+] captn3m0|4 years ago|reply
They haven't made their mind yet on the licensing problem:

>With all these questions, many of them with legal implications [..] there aren't many simple answers. To get the answers the community needs, and to identify the best opportunities for defending user freedom in this space, the FSF is announcing a funded call for white papers to address Copilot, copyright, machine learning, and free software.

Their unacceptable and unjust opinion is just from the licensing of GitHub CoPilot / Visual Studio Code itself:

>We already know that Copilot as it stands is unacceptable and unjust, from our perspective. It requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code), and Copilot is Service as a Software Substitute. These are settled questions as far as we are concerned.

[+] pornel|4 years ago|reply
You can still violate a permissive license by not including a notice and the license text with your project.
[+] user-the-name|4 years ago|reply
Their objection to it has nothing to do with the use of GPL source code. They object to it because:

> We already know that Copilot as it stands is unacceptable and unjust, from our perspective. It requires running software that is not free/libre (Visual Studio, or parts of Visual Studio Code), and Copilot is Service as a Software Substitute.

On the question of the use of source code released under the GPL, they do not have a position yet:

> With all these questions, many of them with legal implications that at first glance may have not been previously tested in a court of law, there aren't many simple answers.

[+] jedberg|4 years ago|reply
They most likely are rebuilding the engine without GPL code and then doing a bunch of functional tests to see how bad it is. If it's not significantly worse, they will probably just not include GPL code anymore.
[+] varajelle|4 years ago|reply
The reason they consider it unacceptable and unjust is simply because GitHub Copilot itself is using a proprietary blob.
[+] makeitdouble|4 years ago|reply
Looks to me like as long as there is no copyright/legal impact they don't really have an opinion ?

It also makes sense, as someone putting their code under BSD for instance do so shouldn't be bothered by copilot regurgitating their code.

[Edit: I mixed BSD and MIT, I was going for the more permissive one. The point on reproducing copyright mentions still stands though]

[+] gibbonsrcool|4 years ago|reply
Do we owe all our professors and textbook makers compensation when we make money off our brain neural networks that they trained? Everyone also keeps talking about how bad copilot is. It’s the first step! It’s only going to improve and probably fast, given the potential value creation.
[+] cblconfederate|4 years ago|reply
copilot looks very cool, but if people end up using it a lot, it probably means their programming language is not expressive enough, after all they were invented in order to be accessible to humans.

What i'd like to see is a copilot for scientific papers. There s so much duplication out there that it would be easy to train and it would save tons of time from the chore of writing and referencing the same things over and over

[+] k__|4 years ago|reply
I think Copilot is a hard problem, maybe it isn't even solvable.

Sometimes it blatantly copies GPL code without my knowledge.

Sometimes I myself write code that could be part of a GPL code-base, without knowing.

Funny thing is, the difference here isn't the actual code that's written, but that Copilot has seen many GPL code bases and I didn't.

Sometimes I really have the feeling Copilot understands my code base and suggests code that seems to be custom tailored to it. Albeit in most of the cases it doesn't fit 100%.

I think the latter cases are when Copilot shines and doesn't violate GPL code at all, but can I be safe? Probably never.

[+] catern|4 years ago|reply
The FSF has made no such statement. This article is complete bullshit and a slanted quote.

The FSF said it was unacceptable because it's proprietary, like Github in general.

They've made no statement about the specific details of Copilot.

[+] Accacin|4 years ago|reply
To be perfectly honest, I think people will realise it's just not that useful and forget about it pretty quickly

Even at my place of work, there were some expressing interest in it, and after playing for an hour or two, haven't touched it since. I get the impression there are more people discussing it than actually using it.

[+] fareesh|4 years ago|reply
What seems useful to me is the ability to type in "function that takes the path to an image file and returns a new image file with rounded corners".

These are not groundbreaking problems - I'm generally looking for solution out there that uses a popular library. This is especially useful if it's a language where I'm not up to date on the de-facto library of choise is for various use-cases. In most cases, especially while prototyping I'm not going to write it myself, nor care about which library - I'm far more concerned with some big picture goal.

If someone builds a product that can do the work of Googling a solution for me, that's the draw of the product. The code is freely available anyway.

[+] vrocmod|4 years ago|reply
I’ve been using Copilot for weeks now. It’s definitely useful for building upon what you already wrote. It’s very effective for single lines, but I don’t trust it to come up with entire functions. I tried, but obviously YMMV.

The licensing is definitely a problem, but I think that Copilot only highlighted the issue - it didn’t create it.

The concept of software license looks pretty fragile to me. You can own software but you can’t really own PL statements.

You can own the whole but you can’t really own the atomic parts that make the whole.

If so, closed-source is just a way to make you work really hard to achieve a result that someone else already achieved by means of obfuscation and secrecy. I’m not sure where open-source stands. Maybe it’s just a social contract.