That would be considered a derivative work of the C code, therefore copyright protected, I believe.
Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If you're prodding an LLM to give you a variety of resu
But significantly editing LLM generated code _should_ make it your copyright again, I believe. Hard to say when this hasn't really been tested in the courts yet, to my knowledge.
The most interesting question, to me, is who cares? If we reach a point where highly valuable software is largely vibe coded, what do I get out of a lack of copyright protection? I could likely write down the behaviour of the system and generate a fairly similar one. And how would I even be able to tell, without insider knowledge, what percentage of a code base is generated?
There are some interesting abuses of copyright law that would become more vulnerable. I was once involved in a case where the court decided that hiding a website's "disable your ad blocker or leave" popup was actually a case of "circumventing effective copyright protection". In this day and age, they might have had to produce proof that it was, indeed, copyright protected.
"Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If that's not the case, probably not." Yes and no. It's possible in theory, but in practice it requires control over the seed, which you typically don't have in the AI coding tools. At least if you're using local models, you can control the seed and have it be deterministic.
That said, you don't necessarily always have 100% deterministic build when compiling code either.
Copyright'd in, copyright out. Your compiled code is subject to your copyright.
You need "significant" changes to PD to make it yours again. Because LLMs are predicated on massive public data use, they require the output to PD. Otherwise you'd be violating the copyright of the learning data - hundreds of thousands of individuals.
No, and your comment is ridiculously bad faith. Courts ruled that outputs of LLMs are not copyrightable. They did not rule that outputs of compilers are not copyrightable.
I think that lawsuit was BS because it went on the assumption that the LLM was acting 100% autonomously with zero human input, which is not how the vast majority of them work. Same for compilers... a human has to give it instructions on what to generate, and I think that should be considered a derivative work that is copyrightable.
fhd2|2 months ago
Can you replay all of your prompts exactly the way you wrote them and get the same behaviour out of the LLM generated code? In that case, the situation might be similar. If you're prodding an LLM to give you a variety of resu
But significantly editing LLM generated code _should_ make it your copyright again, I believe. Hard to say when this hasn't really been tested in the courts yet, to my knowledge.
The most interesting question, to me, is who cares? If we reach a point where highly valuable software is largely vibe coded, what do I get out of a lack of copyright protection? I could likely write down the behaviour of the system and generate a fairly similar one. And how would I even be able to tell, without insider knowledge, what percentage of a code base is generated?
There are some interesting abuses of copyright law that would become more vulnerable. I was once involved in a case where the court decided that hiding a website's "disable your ad blocker or leave" popup was actually a case of "circumventing effective copyright protection". In this day and age, they might have had to produce proof that it was, indeed, copyright protected.
macrolime|2 months ago
That said, you don't necessarily always have 100% deterministic build when compiling code either.
shakna|2 months ago
Public domain in, public domain out.
Copyright'd in, copyright out. Your compiled code is subject to your copyright.
You need "significant" changes to PD to make it yours again. Because LLMs are predicated on massive public data use, they require the output to PD. Otherwise you'd be violating the copyright of the learning data - hundreds of thousands of individuals.
tapoxi|2 months ago
immibis|2 months ago
ranger_danger|2 months ago
unknown|2 months ago
[deleted]