top | item 46941884

(no title)

20k | 21 days ago

This is why its astonishing to me that AI has passed any legal department. I regularly see AI output large chunks of code that are 100% plagiarised from a project - its often not hard to find the original source by just looking up snippets of it. 100s of lines of code just completely stolen

Ai doesn't actually wash licenses, it literally can't. Companies are just assuming they're above the law

discuss

order

direwolf20|20 days ago

It's not about following the law — it's about avoiding penalties in practice.

Did they get penalised? Is anyone getting penalised? No? Then there's no reason for legal to block it.

And remember when you put the GPL license on a project, that's only worth your willingness to sue anyone who violates, otherwise your project is public domain.

rjsw|20 days ago

If the LLM was trained on any GPL licenced code then there is an argument that all output is GPL too, legal departments should be worried.

graemep|20 days ago

I am not aware of any argument for that. Even if the output is a derivative work (which is very doubtful) that would make it a breach of copyright to distribute it under another license, not automatically apply the GPL.

If the output is a derivative work of the input then you would be in breach of copyright if the training data is GPL, MIT, proprietary - anything other than public domain or equivalent.

johnthescott|12 days ago

> by just looking up snippets of it

a fix might be for ai to cite sources accurately.

> Companies are just assuming they're above the law

and so far they appear to be above the law.

thedevilslawyer|20 days ago

This is oft-repeated but never backed up by evidence. Can you share the snippet that was plagiarized?

IX-103|20 days ago

It happens often enough that the company I work for has set up a presubmit to check all of the AI generated and AI assisted code for plagiarism (which they call "recitation"). I know they're checking the code for similarity to anything on GitHub, but they could also be checking against the model's their training corpus.