top | item 41987033

(no title)

How many OSS repositories do I personally have to read through for my own code to be considered stolen property?

That line of thought would get thrown out of court faster than an AI would generate it.

discuss

I assume you're not an AI model, but a real human being (I hope). The analogy "AI == human" just... doesn't work, really.

tensor|1 year ago

I think in this regard it works just fine. If the laws move to say that "learning from data" while not reproducing it is "stealing", then yes, you reading others code and learning from it is also stealing.

If I can't feed a news article into a classifier to teach it to learn whether or not that I would like that article that's not a world I want to live in. And yes it's exactly the same thing as what you are accusing LLMs of.

They should be subject to laws the same way humans are. If they substantially reproduce code they had access to then it's a copyright violation. Just like it would be for a human doing the same. But highly derived code is not "stolen" code, neither for AI nor for humans.

ianeigorndua|1 year ago

That’s beside the point.

Me teaching my brain someone’s way of syntactically expressing procedures is analogous to AI developers teaching their model that same mode of expression.

guerrilla|1 year ago

It's not your reading that would be illegal, but your copying. This is well a documented area of the law and there are concrete answers to your questions.

ianeigorndua|1 year ago

Are you saying that if I see a nice programming pattern in someone else’s code, I am not allowed to use that pattern in my code?

candiddevmike|1 year ago

Can I copy you or provide you as a service?

To me, the argument is a LLM learning from GPL stuff == creating a derivative of the GPL code, just "compressed" within the LLM. The LLM then goes on to create more derivatives, or it's being distributed (with the embedded GPL code).

0x457|1 year ago

Yes, I provide it as a service to my employer. It's called a job. Guess what? When I read code I learn from it and my brain doesn't care what license that code is under.

ianeigorndua|1 year ago

That’s what my employers keep asking.

vvillena|1 year ago

If the product is the result of compiling all the open source code out in the wild into a LLM, it can be argued that the derived product, the LLM itself, must follow the licensing requirements of the used source code.

The AI companies don't care much about this. When the time comes, they will open their models or stop using sources that don't meet the appropriate licensing. Their current concern is learning how to build the best models, and win the race to become the dominant AI provider - who cares if they need to use polluted sources to reach their goal. They will fix it later.

timeon|1 year ago

This seems bit nihilistic. You can't be automated. You can't process repos at scale.

ianeigorndua|1 year ago

Yet.