top | item 43962908

(no title)

My opinion continues to be that AI companies should have to prove that they have consent to use any and all data their models are trained on.

That is, be able to prove a) that their models were actually trained on the data they claim, b) that they have consent to use said data for AI training, and c) that this consent was given by the actual author or with the author's consent.

I want platforms like soundcloud, youtube, etc. to be required to actually send out an e-mail to all of its users "hey we will be using your content for AI training, please click here to give permission".

discuss

rafaelmn|9 months ago

Even if you can enforce this somehow, other countries will not. Unlike copyright and patent law in consumer products and content - getting an upper hand in AI race could have huge implications down the line. So the only government that would enforce this is the one that has no chance of competing in this space in the first place (EU)

dbg31415|9 months ago

Let’s be honest - this is an argument that “the ends justify the means.” But that kind of reasoning should make all of us uneasy. Where do we draw the line? If we eliminated a third of the world’s population to stop global warming, would the noble goal make it acceptable? Clearly not.

We can’t ignore the ethical cost of how AI is being developed - especially when it relies on taking other people’s work without permission. Many of today’s most powerful AI systems were trained on vast datasets filled with human-made content: art, writing, music, code, and more. Much of it was used without consent, credit, or compensation. This isn’t conjecture - it’s been thoroughly documented.

That approach isn’t just legally murky - it’s ethically indefensible. We cannot build the future on a foundation of stolen labor and creativity. Artists, writers, musicians, and other creators deserve both recognition and fair compensation. No matter how impactful the tools become, we cannot accept theft as a business model.

https://arstechnica.com/tech-policy/2025/02/meta-torrented-o...

sofixa|9 months ago

> So the only government that would enforce this is the one that has no chance of competing in this space in the first place (EU)

Mistral waves hello. They're alive and well, and competing well.

Also, while the AI Act and copyright are handled at the EU level, I always get the impression that anyone talking about a "EU government" simply doesn't understand the EU. If you think Germans or Slovaks are rooting for Mistral just because they're European you'd be wrong - they'd be more accepting of it, maybe, due to higher trust in them respecting privacy and related rights, but that's.

DarkWiiPlayer|9 months ago

> Even if you can enforce this somehow

This is super simple to enforce.

For starters, we only really care about the companies developing big commercial AI products, not the people running said models on their home PCs or anything along those lines.

If a company starts offering a new AI model commercially, you simply send someone to audit it and make sure they can provide proof of consent, have their input data, etc.

In most cases, this should be enough. If there's reason to believe an AI company is actually straight up lying to the authorities, you simply have them re-train their model in a controlled environment.

Oh and no, you don't need cryptographically secure random numbers for AI training and/or operation, so you can easily just save your random seeds along with the input data for perfectly reproducible results.

This isn't an enforcement problem, it's a lobbying problem. Lawmakers are convinced that AI will solve their problems for them when reality is that it's still mostly speculation on someone at some point finding a way to make it profitable.

In reality, training and even running AI is still way too expensive to the companies selling them, even without considering the long-term economic impact of the harmful ways they are trained (artists contribute to GDP directly, open source projects do so indirectly, and free services like wikipedia are an important part of modern society; AI is causing massive costs to all of these)

dopidopHN|9 months ago

Not with that attitude for sure. If the US or / and European union do that, it’s already a big chunk

rapind|9 months ago

AI poisoning might be the answer, but it needs a business case. Some sort of SaaS that artists can pay for to process their content that will flood and poison the crawlers.

docdeek|9 months ago

> I want platforms like soundcloud, youtube, etc. to be required to actually send out an e-mail to all of its users "hey we will be using your content for AI training, please click here to give permission”.

Wouldn’t sites like YouTube already have a license to make money off your content anyway? This might be a little out of date but it notes that even though you own the material you upload to YouTube, by uploading it you grant them a license to make money off it, sub-license it to others for commerical gain, make derivative works etc. IANAL but this suggests to me that if you upload it to YouTube, YouTube can license it to OpenAI without needing to inform you or get additional consent. [0]

[0]: https://www.theguardian.com/money/2012/dec/20/who-owns-conte...

DarkWiiPlayer|9 months ago

You can tell I'm European, but I think in this case, at the time when consumers accepted these conditions they might not have had any way of understanding the ramifications, so effectively there is no informed consent.

In other words, now that people have had a taste of it and know what they're actually consenting to, companies should have to get renewed consent (positive consent, that is) instead of relying on "you agreed to this before it was even a real thing".

It kind of comes down to the you can't put a "you sell your soul" clause in the terms and conditions of a coffee subscription service mentality: at what point do you simply say "this is obviously in bad faith" and declare it void rather than just say "it's silly, but you signed it".

And I think there's massive cultural differences regarding where that line is drawn.

lawlessone|9 months ago

citing an article from 2012? I don't think much of this kind of training was happening then

simonw|9 months ago

Should an AI model be able to answer the question "which team won the superbowl in 2023" if there are thousands of articles out there containing that information but not a single one of them has been licensed for use by AI?

DarkWiiPlayer|9 months ago

If you could separate the information from the intellectual property, sure; but if the model is also capable of generating a similar article, that's the point where it starts infringing on the IP of all the authors whose articles were fed into the model.

So in practice, no, it shouldn't. Not because that information itself is bad, but because it probably isn't limited to just that answer.

In summary, I think it is definitely a problem when:

1. The model is trained on a certain type of intellectual property 2. The model is then asked to produce content of the same type 3. The authors of the training data did not consent

And slightly less so, but still questionable when instead:

2. The IP becomes an integral part of the new product

which, arguably, is the case for any and all AI training data; individually you could take any of them out and not much would happen, but remove them all and the entire product is gone.

dingnuts|9 months ago

No.

That's a funny example since broadcasters have to pay a fee to say "The Super Bowl" in the first place. If they don't, they have to use some euphemism like "the big game."

The answer is definitely no. You cannot use something that you don't have a license for unless it belongs to you.

amelius|9 months ago

> please click here to give permission

I want "please mail back this physical form, signed".

It's way too easy with dark-patterns to make people inadvertently click buttons. Or to pretend that people did.

detectivestory|9 months ago

I'm pretty sure soundcloud has already done this. I don't believe they gave an option to opt-out though.

DarkWiiPlayer|9 months ago

Then they are stealing people's content and imho should be punished for it. It is baffling that we let companies get away with "if you don't opt out you agree" or even "you can't opt out, delete your account or you agree" and often hide that in generic sounding terms & conditions updates.

Again, I think we should require companies to get the user to actively give their consent to these things. Platforms are free to lock or terminate accounts that don't, but they shouldn't be allowed to steal content because someone didn't read an e-mail.