top | item 34523747

(no title)

Yeah, that's a fair critique, I think the short answer is depends who you ask.

See this FAQ here: https://www.licenses.ai/faq-2

Specifically:

Q: "Are OpenRAILs considered open source licenses according to the Open Source Definition? NO."

A: "THESE ARE NOT OPEN SOURCE LICENSES, based on the definition used by Open Source Initiative, because it has some restrictions on the use of the licensed AI artifact.

That said, we consider OpenRAIL licenses to be “open”. OpenRAIL enables reuse, distribution, commercialization, and adaptation as long as the artifact is not being applied for use-cases that have been restricted.

Our main aim is not to evangelize what is open and what is not but rather to focus on the intersection between open and responsible licensing."

FWIW, there's a lot of active discussion in this space, and it could be the case that e.g. communities settle on releasing code under OSI-approved licenses and models/artifacts under lowercase "open" but use-restricted licenses.

discuss

kmeisthax|3 years ago

My biggest critique of OpenRAIL is that it's not entirely clear that AI is copyrightable[0] to begin with. Specifically the model weights are just a mechanical derivation of training set data. Putting aside the "does it infringe[1]" question, there is zero creativity in the training process. All the creativity is either in the source images or the training code. AI companies scrape source images off the Internet without permission, so they cannot use the source images to enforce OpenRAIL. And while they would own the training code, nobody is releasing training code[2], so OpenRAIL wouldn't apply there.

So I do not understand how the resulting model weights are a subject of copyright at all, given that the US has firmly rejected the concept of "sweat of the brow" as a copyrightability standard. Maybe in the EU you could claim database rights over the training set you collected. But the US refuses to enforce those either.

[0] I'm not talking about "is AI art copyrightable" - my personal argument would be that the user feeding it prompts or specifying inpainting masks is enough human involvement to make it copyrightable.

The Copyright Office's refusal to register AI-generated works has been, so far, purely limited to people trying to claim Midjourney as a coauthor. They are not looking over your work with a fine-toothed comb and rejecting any submissions that have badly-painted hands.

[1] I personally think AI training is fair use, but a court will need to decide that. Furthermore, fair use training would not include fair use for selling access to the AI or its output.

[2] The few bits of training code I can find are all licensed under OSI/FSF approved licenses or using libraries under such licenses.

nickvincent|3 years ago

This is a great point.

Not a lawyer, but as I understand the most likely way this question will be answered (for practical purposes in the US) is via the ongoing lawsuits against GitHub Copilot and Stable Diffusion and Midjourney.

I personally agree the creativity is in the source images and the training code, but think that unless it is decided that for legal purposes "AI Artifacts" (the files containing model weights, embedding, etc.) are just transformations of training data and therefore content and subject to the same legal standards as content, I see a lot of value in trying to let people license training and code and models separately. And if models are just transformations of content, I expect we can adjust the norms around licensing to achieve similar outcomes (i.e., trying to balance open sharing with some degree of creator-defined use restriction).

twoodfin|3 years ago

How would you distinguish “just a mechanical derivation of training set data” from compiled binary software? The latter seems also to be a mechanical derivation from the source code, but inherits the same protections under copyright law.

taneq|3 years ago

“Mechanical derivation” is doing a lot of heavy lifting here. What qualifies something as “mechanical”? Any algorithm? Or just digital algorithms? Any process entirely governed by the laws of physics?

cwkoss|3 years ago

Is the choice of what to train upon not creative? I feel like it can be.

kaoD|3 years ago

> nobody is releasing training code

Interesting. Why is this happening?

skybrian|3 years ago

Fair enough. "Source available" would be better than "open source" in this case, to avoid misleading people. (You do want them to read the terms.)

daveloyall|3 years ago

I'm not familiar with machine learning.

But, I'm familiar with poking around in source code repos!

I found this https://huggingface.co/openjourney/openjourney/blob/main/tex... . It's a giant binary file. A big binary blob.

(The format of the blob is python's "pickle" format: a binary serialization of an in-memory object, used to store an in-memory object and later load it, perhaps on a different machine.)

But, I did not find any source code for generating that file. Am I missing something?

Shouldn't there at least be a list of input images, etc and some script that uses them to train the model?

JoshTriplett|3 years ago

Yeah, this should not have a headline of "open source". Really disappointing that this isn't actually open, or even particularly close to being open.

EamonnMR|3 years ago

Seems like 'the lawyers who made the license' and the OSI might be good authorities on what's open source. I'd love to hear a good FSF rant about RAIL though.

unknown|3 years ago

[deleted]

dmm|3 years ago

Are ML models even eligible for copyright protection? The code certainly but what about the trained weights?

charcircuit|3 years ago

My thought is that it is a derivative work from the training data. The creativity comes from what you choose to or not to include.