(a) immediate changes to the licenses for open-source code created by developers that will allow access and/or use of any open-source code to humans only;
(b) we suggest revisions to the Massachusetts Institute of Technology (``MIT'') license so that AI systems procure appropriate licenses from open-source code developers, which we believe will harmonize standards and build social consensus for the benefit of all of humanity rather than profit-driven centers of innovation;
(c) We call for urgent legislative action to protect the future of AI systems while also promoting innovation; and
(d) we propose that there is a shift in the burden of proof to AI systems in obfuscation cases..."
questioning the premise that is assumed by saying that code is "stolen" -- isn't transformed copyrighted material protected from copyright claims? Surely neural network training falls under this category?
"Transformative" in copyright is a much stronger word than the word "transform" is in computer science. If your "transformation" could be used in many of the same places as the original, it is most likely not transformative in the copyright sense. If you modify or add on to a work, most of the time what you're creating is called a derivative work, which is not protected from copyright claims. We don't really have established law on this particular case, but to me, what neural networks do seems much more similar to tracing or redrawing some artwork from memory than it does to, for example, a parody.
But also, non-human actors cannot create copyrighted works, transformative or otherwise. For example, a photographer once gave his camera to a monkey and it took some pretty interesting pictures. He tried to get copyright for these photos, but the Copyright Office ruled that he wasn't the one who created them and the monkey is not a person, so the photos are not copyrightable at all. The law could be changed in these cases, but if we're treating the AI as a creative agent rather than a reservoir of other people's works that humans can request from it, then it seems to me like AI-created works should simply not be copyrightable.
A derivative work is something like making a collage or a video game modification/sequel.
An AI isn't really either of those. It doesn't contain the original artwork in any sense, and as a trained numerical model it's not a derivative of any other works it was trained on. No more than you looking at the artwork and learning aspects of it at least.
In the copyright sense it should be treated like you treat a person. If it produces something that is clearly a derivative copy of something. Hit it for copyright. Otherwise, treat the output as novel output.
If you use an AI you need to be held liable for whatever you take from it and distribute, if you violate copyright you violate copyright.
But the AI itself is no such thing. It's a model trained on billions of examples which fundamentally cannot contain all of the works it was trained on because it's just too small of a model and too large of a data.
To sue over that containing artwork would be like to sue over Photoshop containing artwork because you can produce a derivative work based on it.
AI code generation systems provide responses (output) to questions or requests by accessing the vast library of open-source code created by developers over decades. However, they do so by allegedly stealing the open-source code stored in virtual libraries, known as repositories.
I might just be super biased but this article seems much more like personal musings than any sort of academic investigation - this description of how LLMs work isn’t even close. Also, you don’t get to say “allegedly” then just assume it’s true!
The authors here are, in turn:
- a lawyer/entrepreneur in copyright-related business (?) and a bunch of random shit like houses made out of mycelium
- the administrator who ran the US Air Force “accelerator” that paid for this research - supposedly an accomplished scientist in many fields but also how likely is it that they significantly contributed?
- an “air force judge advocate” who’s been named to investigate these issues and supposedly writes about technology
- the 2l law student who probably actually wrote the article and did all the work (typical, lol)
All of these people have impressive resumes, but none that make me question my initial assessment that they are going off their gut, rather than looking at what LLMs actually do during training
More importantly, all of this is moot and a waste of resources IMO - the LLMs are already trained well on language, and further improvements will not come from just plugging in a bigger corpus. I also really really don’t think Microsoft will have to crawl GitHub repos to teach GPT6 to program in python 4.0; generative examples, reading the spec, and agential systems make that concern moot.
More gripes because I’m feeling cranky any this paper makes me smile:
The rise of Generative Artificial Intelligence systems (“AI systems”) parallels the Greek myth of Pandora who was overwhelmed with curiosity and opened the Box “[r]eleasing curses upon mankind.” Pandora’s Box is not solely about evil or curses as the artifact-looking Box included Elpis, the personified spirit of Hope, and is a clear reminder that a lot of good can come out of the development of AI systems.
To add some more context to the analogies here, AI systems can be thought of as: “[t]he monster plant Audrey II in Little Shop of Horrors, constantly crying out ‘Feed me!’”Why? Because ChatGPT and other AI systems provide a natural language response (output) to questions or requests by accessing vast libraries of content created over decades.
Are we allowed to start academic literature like that…? That’s goofy af
AI systems trained on code are nothing more than detection calculators or processors that can suggest or simulate statistical patterns, which is certainly not the equivalent of human-like reasoning.
They quote an argument for this that I can’t find so fair enough, but regardless can we agree that “certainly” is misused here?! Let’s stick with “apparently” or “we argue”
AI systems are certainly here to stay and will replicate as the mythical Lernaean Hydra,
which seems to support the idea that the only way to defeat the monster plant is to cut off its food supply.
Literally the only reason I read this paper was to figure out what a lenaean hydra is, and AFAICT it’s just mentioned this one time in the conclusion. And for all that, the metaphor seems… random? “LLMs are like hydras because we can only kill them through starvation” doesn’t relate very clearly to the actual thesis, which is “the law should mandate that companies pay individuals for the use of their content in AI training sets”
P.S. it is hilarious that the fucking US Military is raising this concern, considering that they’re one of the biggest consumers of open source software even though most open source developers would absolutely hate that
The paper proposes (on p32) that the following four paragraphs (pp 35-36) be added to the standard MIT license (which begins with "Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files...").
> The terms “person” and “individual” are defined as a natural person, as the term is defined by the United States Patent and Trademark Office (PTO), and/or 35 U.S.C. § 100, as amended. The term “Artificial Intelligence Model” means any non-human generative machine learning system or computer program, algorithm, or functional prediction engine supported by cloudbased/computing platforms. The term “Source Code” means the preferred form of a program for making, creating, and modifying software source code, documentation source, and configuration files.
> No use, modification, combination, study, collection, share, reproduction, distribution, and/or access of Software may be made under this License, by any non-human generative Artificial Intelligence Model without the express written consent of the inventor, which may be withheld or delayed for any reason. Any appropriation, adoption, disclosure, reproduction, use, and/or access of the licensed Software by any non-human Generative Artificial Intelligence Model shall immediately terminate all rights granted to the Licensee. The Licensor shall have the right, at any time, to withdraw consent by written notice, thereby terminating with immediate effect all use of Software made under this License unless otherwise specified. This License is the controlling instrument and supersedes all prior and conflicting Terms of Service, Privacy Statements, and/or Terms for Additional Products and Features of source repositories where this License may be distributed by the owner of the License.
> By accessing and using this data, you acknowledge that you have read, understood, and agree to be bound by these terms and conditions. If you do not agree to these terms and conditions, you may not access or use this data. You may not use this data for the training or inference of Generative artificial intelligence models without the prior permission of the copyright holder. (“Generative artificial intelligence models” are used to create new content or data that is similar to the original data, but not identical. Examples of Generative artificial intelligence models include but are not limited to, text generation models, image and video generation models, and music generation models. The restrictions on Generative artificial intelligence models apply to any use of this data, whether the generative artificial intelligence is trained on this data or uses this data for inference.)
> Any attempt by other artificial intelligence models to access or use this data without such permission shall be deemed a violation of this license and a breach of copyright laws. The copyright holder reserves the right to pursue all legal remedies available, including but not limited to injunctive relief and damages, against any party that violates this license.
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] jrflowers|2 years ago|reply
Are Altman And Yudkowsky the Troilus and Cressida of AI?
Is AMD The Agamemnon Of Machine Learning?
[+] [-] belter|2 years ago|reply
(a) immediate changes to the licenses for open-source code created by developers that will allow access and/or use of any open-source code to humans only;
(b) we suggest revisions to the Massachusetts Institute of Technology (``MIT'') license so that AI systems procure appropriate licenses from open-source code developers, which we believe will harmonize standards and build social consensus for the benefit of all of humanity rather than profit-driven centers of innovation;
(c) We call for urgent legislative action to protect the future of AI systems while also promoting innovation; and
(d) we propose that there is a shift in the burden of proof to AI systems in obfuscation cases..."
https://arxiv.org/ftp/arxiv/papers/2306/2306.09267.pdf
[+] [-] zalyalov|2 years ago|reply
That would be the most hilarious outcome possible. Oops, you can't store open source code on your computer, the access is only for humans!
[+] [-] chaos_emergent|2 years ago|reply
[+] [-] chc|2 years ago|reply
But also, non-human actors cannot create copyrighted works, transformative or otherwise. For example, a photographer once gave his camera to a monkey and it took some pretty interesting pictures. He tried to get copyright for these photos, but the Copyright Office ruled that he wasn't the one who created them and the monkey is not a person, so the photos are not copyrightable at all. The law could be changed in these cases, but if we're treating the AI as a creative agent rather than a reservoir of other people's works that humans can request from it, then it seems to me like AI-created works should simply not be copyrightable.
[+] [-] bioemerl|2 years ago|reply
A derivative work is something like making a collage or a video game modification/sequel.
An AI isn't really either of those. It doesn't contain the original artwork in any sense, and as a trained numerical model it's not a derivative of any other works it was trained on. No more than you looking at the artwork and learning aspects of it at least.
In the copyright sense it should be treated like you treat a person. If it produces something that is clearly a derivative copy of something. Hit it for copyright. Otherwise, treat the output as novel output.
If you use an AI you need to be held liable for whatever you take from it and distribute, if you violate copyright you violate copyright.
But the AI itself is no such thing. It's a model trained on billions of examples which fundamentally cannot contain all of the works it was trained on because it's just too small of a model and too large of a data.
To sue over that containing artwork would be like to sue over Photoshop containing artwork because you can produce a derivative work based on it.
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] bbor|2 years ago|reply
The authors here are, in turn:
- a lawyer/entrepreneur in copyright-related business (?) and a bunch of random shit like houses made out of mycelium
- the administrator who ran the US Air Force “accelerator” that paid for this research - supposedly an accomplished scientist in many fields but also how likely is it that they significantly contributed?
- an “air force judge advocate” who’s been named to investigate these issues and supposedly writes about technology
- the 2l law student who probably actually wrote the article and did all the work (typical, lol)
All of these people have impressive resumes, but none that make me question my initial assessment that they are going off their gut, rather than looking at what LLMs actually do during training
More importantly, all of this is moot and a waste of resources IMO - the LLMs are already trained well on language, and further improvements will not come from just plugging in a bigger corpus. I also really really don’t think Microsoft will have to crawl GitHub repos to teach GPT6 to program in python 4.0; generative examples, reading the spec, and agential systems make that concern moot.
More gripes because I’m feeling cranky any this paper makes me smile:
Are we allowed to start academic literature like that…? That’s goofy af They quote an argument for this that I can’t find so fair enough, but regardless can we agree that “certainly” is misused here?! Let’s stick with “apparently” or “we argue” which seems to support the idea that the only way to defeat the monster plant is to cut off its food supply.Literally the only reason I read this paper was to figure out what a lenaean hydra is, and AFAICT it’s just mentioned this one time in the conclusion. And for all that, the metaphor seems… random? “LLMs are like hydras because we can only kill them through starvation” doesn’t relate very clearly to the actual thesis, which is “the law should mandate that companies pay individuals for the use of their content in AI training sets”
[+] [-] bbor|2 years ago|reply
[+] [-] bgrainger|2 years ago|reply
> The terms “person” and “individual” are defined as a natural person, as the term is defined by the United States Patent and Trademark Office (PTO), and/or 35 U.S.C. § 100, as amended. The term “Artificial Intelligence Model” means any non-human generative machine learning system or computer program, algorithm, or functional prediction engine supported by cloudbased/computing platforms. The term “Source Code” means the preferred form of a program for making, creating, and modifying software source code, documentation source, and configuration files.
> No use, modification, combination, study, collection, share, reproduction, distribution, and/or access of Software may be made under this License, by any non-human generative Artificial Intelligence Model without the express written consent of the inventor, which may be withheld or delayed for any reason. Any appropriation, adoption, disclosure, reproduction, use, and/or access of the licensed Software by any non-human Generative Artificial Intelligence Model shall immediately terminate all rights granted to the Licensee. The Licensor shall have the right, at any time, to withdraw consent by written notice, thereby terminating with immediate effect all use of Software made under this License unless otherwise specified. This License is the controlling instrument and supersedes all prior and conflicting Terms of Service, Privacy Statements, and/or Terms for Additional Products and Features of source repositories where this License may be distributed by the owner of the License.
> By accessing and using this data, you acknowledge that you have read, understood, and agree to be bound by these terms and conditions. If you do not agree to these terms and conditions, you may not access or use this data. You may not use this data for the training or inference of Generative artificial intelligence models without the prior permission of the copyright holder. (“Generative artificial intelligence models” are used to create new content or data that is similar to the original data, but not identical. Examples of Generative artificial intelligence models include but are not limited to, text generation models, image and video generation models, and music generation models. The restrictions on Generative artificial intelligence models apply to any use of this data, whether the generative artificial intelligence is trained on this data or uses this data for inference.)
> Any attempt by other artificial intelligence models to access or use this data without such permission shall be deemed a violation of this license and a breach of copyright laws. The copyright holder reserves the right to pursue all legal remedies available, including but not limited to injunctive relief and damages, against any party that violates this license.