(no title)
emcq
|
3 years ago
Be wary of using this model - the licensing of this model seems sketchy. Several of the datasets used for training like WSJ and TED-LIUM have clear non-commercial clauses. I'm not a lawyer but releasing a model as "MIT" seems dubious, and hopefully OpenAI has paid for the appropriate licenses during training as they are no longer a research-only non profit.
jefftk|3 years ago
petercooper|3 years ago
I think it'd be a lot harder to make a case for an accurate audio to text transcription being seen to violate the copyright of any of the training material in the way a visual could.
bscphil|3 years ago
Most of the debate seems to be happening on the question of whether everything produced by models trained on copyrighted work represents a derivative work. I argue that at the very least some of it does; so the claim said to be made by the AI companies (see quote above) is clearly a false one.
We're in a weird place now where AI is able to generate "near verbatim" work in a lot of cases, but I don't see an obvious case for treating this any differently than a human reproducing IP with slight modifications. (I am not a lawyer.)
For example, copyright law currently prevents you from selling a T-shirt with the character Spider-Man on it. But plenty of AI models can give you excellent depictions of Spider-Man that you could put on a T-shirt and try to sell. It's quite silly to think that any judge is going to take you seriously when you argue that your model, which was trained on a dataset that included pictures of Spider-Man, and was then asked to output images using "Spider-Man" as a search term, has magically circumvented copyright law.
(I think there's a valid question about whether models represent "derivative work" in the GPL sense specifically, but I'm using the idea more generally here.)
emcq|3 years ago
nshm|3 years ago
pabs3|3 years ago