(no title)
DarkWiiPlayer | 9 months ago
That is, be able to prove a) that their models were actually trained on the data they claim, b) that they have consent to use said data for AI training, and c) that this consent was given by the actual author or with the author's consent.
I want platforms like soundcloud, youtube, etc. to be required to actually send out an e-mail to all of its users "hey we will be using your content for AI training, please click here to give permission".
rafaelmn|9 months ago
dbg31415|9 months ago
We can’t ignore the ethical cost of how AI is being developed - especially when it relies on taking other people’s work without permission. Many of today’s most powerful AI systems were trained on vast datasets filled with human-made content: art, writing, music, code, and more. Much of it was used without consent, credit, or compensation. This isn’t conjecture - it’s been thoroughly documented.
That approach isn’t just legally murky - it’s ethically indefensible. We cannot build the future on a foundation of stolen labor and creativity. Artists, writers, musicians, and other creators deserve both recognition and fair compensation. No matter how impactful the tools become, we cannot accept theft as a business model.
https://arstechnica.com/tech-policy/2025/02/meta-torrented-o...
sofixa|9 months ago
Mistral waves hello. They're alive and well, and competing well.
Also, while the AI Act and copyright are handled at the EU level, I always get the impression that anyone talking about a "EU government" simply doesn't understand the EU. If you think Germans or Slovaks are rooting for Mistral just because they're European you'd be wrong - they'd be more accepting of it, maybe, due to higher trust in them respecting privacy and related rights, but that's.
DarkWiiPlayer|9 months ago
This is super simple to enforce.
For starters, we only really care about the companies developing big commercial AI products, not the people running said models on their home PCs or anything along those lines.
If a company starts offering a new AI model commercially, you simply send someone to audit it and make sure they can provide proof of consent, have their input data, etc.
In most cases, this should be enough. If there's reason to believe an AI company is actually straight up lying to the authorities, you simply have them re-train their model in a controlled environment.
Oh and no, you don't need cryptographically secure random numbers for AI training and/or operation, so you can easily just save your random seeds along with the input data for perfectly reproducible results.
This isn't an enforcement problem, it's a lobbying problem. Lawmakers are convinced that AI will solve their problems for them when reality is that it's still mostly speculation on someone at some point finding a way to make it profitable.
In reality, training and even running AI is still way too expensive to the companies selling them, even without considering the long-term economic impact of the harmful ways they are trained (artists contribute to GDP directly, open source projects do so indirectly, and free services like wikipedia are an important part of modern society; AI is causing massive costs to all of these)
dopidopHN|9 months ago
rapind|9 months ago
docdeek|9 months ago
Wouldn’t sites like YouTube already have a license to make money off your content anyway? This might be a little out of date but it notes that even though you own the material you upload to YouTube, by uploading it you grant them a license to make money off it, sub-license it to others for commerical gain, make derivative works etc. IANAL but this suggests to me that if you upload it to YouTube, YouTube can license it to OpenAI without needing to inform you or get additional consent. [0]
[0]: https://www.theguardian.com/money/2012/dec/20/who-owns-conte...
DarkWiiPlayer|9 months ago
In other words, now that people have had a taste of it and know what they're actually consenting to, companies should have to get renewed consent (positive consent, that is) instead of relying on "you agreed to this before it was even a real thing".
It kind of comes down to the you can't put a "you sell your soul" clause in the terms and conditions of a coffee subscription service mentality: at what point do you simply say "this is obviously in bad faith" and declare it void rather than just say "it's silly, but you signed it".
And I think there's massive cultural differences regarding where that line is drawn.
lawlessone|9 months ago
simonw|9 months ago
DarkWiiPlayer|9 months ago
So in practice, no, it shouldn't. Not because that information itself is bad, but because it probably isn't limited to just that answer.
In summary, I think it is definitely a problem when:
1. The model is trained on a certain type of intellectual property 2. The model is then asked to produce content of the same type 3. The authors of the training data did not consent
And slightly less so, but still questionable when instead:
2. The IP becomes an integral part of the new product
which, arguably, is the case for any and all AI training data; individually you could take any of them out and not much would happen, but remove them all and the entire product is gone.
dingnuts|9 months ago
That's a funny example since broadcasters have to pay a fee to say "The Super Bowl" in the first place. If they don't, they have to use some euphemism like "the big game."
The answer is definitely no. You cannot use something that you don't have a license for unless it belongs to you.
amelius|9 months ago
I want "please mail back this physical form, signed".
It's way too easy with dark-patterns to make people inadvertently click buttons. Or to pretend that people did.
detectivestory|9 months ago
DarkWiiPlayer|9 months ago
Again, I think we should require companies to get the user to actively give their consent to these things. Platforms are free to lock or terminate accounts that don't, but they shouldn't be allowed to steal content because someone didn't read an e-mail.