I’m sure you think this is a clever reply but the reality is that GitHub wouldn’t even begin to think if that were even technically possible. If it got out that it trained on confidential customer data, it would be game over. The risk is so stupidly large nobody in their right mind would take it. So yeah, if they say they don’t, they don’t.
I don't understand why people just automatically doubt things that companies say when they can be sued (or would otherwise destroy their business) if they are lying about it. Seems unnecessarily pessimistic.
People doubt Microsoft because they've historically run a very aggressive business and done things of questionable morality many times.
They've been to court and they've lost and it definitely hasn't destroyed their business one bit.
For example, Microsoft subsidiary LinkedIn routed customer email through their servers so that they could scrape it. They did that without customer knowledge via a dark patten.
They later apologised for doing it but still used it to propel the company's growth. In the end it didn't hurt anything but their reputation for respecting people's privacy.
Microsoft's own anti-trust history is littered with exceptional behaviour too. They are the size they are now by dint of super aggressive business practices.
Normally because history shows us that redress via the court systems is rarely punitive to a company the size of Microsoft, further Microsoft has a long history of lying to its customers with seemingly no impact on its business.
I mean, we discovered that the whole car industry was lying flagrantly on their emission tests which had the potential of destroying the whole business and there were A LOT of people who knew about it and could talk anytime
But will that actually be against ToS or copyright? Many people tend to say that copilot learning from OSS doesn’t infringe any copyright and is no different from a person just learning from someone else’s work. So how is it different if copilot is learning from private repositories? Or eg from leaked source code?
I'm frequently told on HN that Big Tech would willingly, flagrantly violate GDPR like its nothing. Even if the upside of collecting that info was minimal and the downside was 4% of global revenue.
I guess if they can do that, then what's a small lie about private repos between friends.
Because they do shady shit, like, by default Copilot would "sample" code for training while using it. Maybe this is no longer the default, maybe it still is, but it was the default.
This type of thing erodes trust? Why should my proprietary code be used for training by default?
az226|2 years ago
account42|2 years ago
unreal37|2 years ago
Lio|2 years ago
They've been to court and they've lost and it definitely hasn't destroyed their business one bit.
For example, Microsoft subsidiary LinkedIn routed customer email through their servers so that they could scrape it. They did that without customer knowledge via a dark patten.
They later apologised for doing it but still used it to propel the company's growth. In the end it didn't hurt anything but their reputation for respecting people's privacy.
Microsoft's own anti-trust history is littered with exceptional behaviour too. They are the size they are now by dint of super aggressive business practices.
phpisthebest|2 years ago
yulaow|2 years ago
Why wouldn't sw companies do the same?
bilqis|2 years ago
nindalf|2 years ago
I guess if they can do that, then what's a small lie about private repos between friends.
esrauch|2 years ago
ChatGTP|2 years ago
This type of thing erodes trust? Why should my proprietary code be used for training by default?
I was really annoyed by this.