top | item 41614490

They stole my voice with AI

524 points| sounds | 1 year ago |jeffgeerling.com

446 comments

order
[+] ryzvonusef|1 year ago|reply
Everyone has their own fears about AI, but my fears are especially chilling; what if AI was used to imitate a person saying something blasphemeous?

My country is already has blasphemy lynching mobs based on the slightest perceived insult, real or imagined. They will mob you, lynch you, burn your corpse, then distribute sweets while you family hide and issue video messages denouncing you and forgiving the mob.

And this was before AI was easy to access. You can say a lot of things about 'oh backward countries' but this will not stay there, this will spread. You can't just give a toddler a knife and then blame them for stabbing someone.

Has nothing to do with fame, with security, with copyright. This will get people killed. And we have no tools to control this.

https://x.com/search?q=blasphemy

I fear the future.

[+] losvedir|1 year ago|reply
I think the answer, counterintuitively, is to make these AI tools more open and accessible. As long as they're restricted or regulated or inaccessible people will continue to think of videos and recordings as not fakeable. But make voice cloning something easy and fun to do with a $1 app, let the teens have their prank call fun and pretty soon it should work its way into the public consciousness.

I had my 70 year mother ask me last week if she should remove her voicemail message because can't people steal her voice with it? I was surprised but I guess she heard it on a Fox segment or something.

I think it might be a rough couple years but hopefully we'll be through it soon.

[+] kmlx|1 year ago|reply
> what if AI was used to imitate a person saying something blasphemeous?

> My country is already has blasphemy lynching mobs

in your case the problem is not AI, it’s your country.

[+] godelski|1 year ago|reply

  > what if AI was used to imitate a person saying something blasphemeous?
I've been contemplating writing an open letter to Dang to nuke my account. Because at this time you can likely deanonymize any user with a fair amount of comments. As long as you can correlate. You can certainly steal their language, even if not 100% accurate. It may be caution, but it isn't certain that we won't enter a dark forest and there's reason to believe we could be headed that way. But at the same time, is not retreating to the shadows giving up?
[+] vasco|1 year ago|reply
The best we can hope for is that one personally avoids this for the first 5 years or so, and then it gets so widespread and easy that everyone will start doubting any videos they watch.

The same way it took social media like reddit a few years of "finding the culprit" / "name and shame" till mods figured out that many times the online mob gets it wrong and so now that is usually not allowed.

But many people will suffer this until laws get passed or it enters into common consciousness that a video is more likely to be fake than it is to be real. Might be more than 5 years though. And unfortunately laws usually only get passed after there's proven damage to some people from it.

[+] pnut|1 year ago|reply
I guess then, you should use AI to generate videos of all of the lynch mob leadership committing blasphemy and let them sort it out internally?
[+] movedx|1 year ago|reply
One way that we technical folk can help prevent this is by purchasing a domain that we can call our own and then host a website that's very clear: "If my image or voice is used in a piece of digital media that is not linked here from this domain, it was not produced by me."

That, and cryptographic materials being used to sign stuff too.

I think that's possibly the best we can hope for from a technical perspective as well as waiting for the legal system to catch up.

[+] sureglymop|1 year ago|reply
My specific fear is that if a picture of you next to your name is available online, that becomes part of the training set of a future model. Paranoically, I do not have any picture of myself available online.

I could then trivially generate pictures or even videos of you e.g. by knowing your name. Of course that's just an example but I do think that's where we are headed and so the concept of "trust" will change a lot.

[+] Jeff_Brown|1 year ago|reply
Given that this tech is unstoppable, the best defense might be a good offense: Flood the internet with clips of prominent religious and political leaders, especially those largely responsible for mob violence historically, saying preposterously blasphemous things they would obviously never say.
[+] blueflow|1 year ago|reply
> And we have no tools to control this.

Do you know "The boy who cried wolf"? Fabricate some allegations yourself and this will train people to disbelieve them.

[+] smusamashah|1 year ago|reply
I can absolutely relate with your fear, but I think this will eventually be helpful to dismiss those mobs. Might even desensitize people boiling over 'blasphemy'. Yes, for the first few instances it will hurt. Then, eventually it will become common enough to be known by common folk. Enough that those people themselves will be sceptic enough to not act.

I recall photoshop blackmailing stories where usually woman were the target. Now literally "everyone" knows pictures can be manipulated/photoshopped. It will take a while yes, but eventually common folk will learn that these audios/videos can't be trusted.

[+] valval|1 year ago|reply
You’d simply make such things highly illegal. No matter how I spin it in my head, there’s nothing particularly scary about this, like there isn’t about identity theft or any other crime, in reality.

Even if blasphemy is illegal in your country, people would probably agree that falsely accusing someone of blasphemy is also wrong.

[+] mrkramer|1 year ago|reply
The only logical legal solution is that any content of you shared by you is legitimate one and all other content of you shared by somebody else is presumed non-authenthic and possibly fake.
[+] cloudguruab|1 year ago|reply
It’s not just a problem that’ll stay in one place either. This tech is getting easier, and the consequences could be deadly. Scary times, for sure.
[+] charlieyu1|1 year ago|reply
From Hong Kong. We already had fake audio messages that sounded like a protest leader during 2014 protests… It was always there, even a long time ago
[+] gwervc|1 year ago|reply
This is nothing to do with AI but with intolerance of a certain religion. That religion is killing a lot in my country and many others too, but both the governments (national and supranational) and corporations censor any criticism of it. Even here on HN I got posts and accounts removed by the moderation for the slightest hint of criticism against it, and fully expected a downvoting mob by writing this comment. Sadly, it'll will continue for a long time giving how taboo the subject is.
[+] bufferoverflow|1 year ago|reply
It's sounds like a problem with your crazy population, not with AI.
[+] veunes|1 year ago|reply
The analogy of handing a toddler a knife is spot on. AI is an incredibly powerful tool, but without proper safeguards, regulations or education, it can cause irreparable harm
[+] loceng|1 year ago|reply
We have ourselves. We have to create a culture of learning to quell reactive emotions - so we're less ideological and more critical thinker.
[+] fennecbutt|1 year ago|reply
The people are the problem not the tool.
[+] benterix|1 year ago|reply
I'm very sorry to say this but if you live in a country that is killing others for what they say, AI is probably not your biggest problem. And I don't believe an easy solution exists.
[+] pmarreck|1 year ago|reply
> My country is already has blasphemy lynching mobs based on the slightest perceived insult, real or imagined. They will mob you, lynch you, burn your corpse, then distribute sweets while you family hide and issue video messages denouncing you and forgiving the mob.

Blasphemy laws—and the violence that sometimes accompanies them—are a cultural issue, not a technological one. When the risk of mob violence is in play, it's hard to have rational discussions about any kind of perceived offense, especially when it can be manipulated, even technologically, as you pointed out. The hypothetical of voice theft amplifies this: If a stolen voice were used to blaspheme, who would truly be responsible?

This is why we must resist the urge to give into culturally sanctioned violence or fear, regardless of religious justification. The truth doesn’t need to be violently defended; it stands by itself. If a system cannot tolerate dissent without devolving into chaos, then the problem lies within the system, not the dissent.

“An appeaser is one who feeds the crocodile, hoping it will eat him last.” - Winston Churchill

[+] firtoz|1 year ago|reply
> You can say a lot of things about 'oh backward countries' but this will not stay there, this will spread

I'm sorry, but this is a cope out. The "lynching from apparent cultural deviation" is something that needs to be moved on from. Developed countries do the same too to some extent, with "cancel culture" and such.

There are ways to have progress in this, and, well, to feed someone's entrepreneurial spirit, it's one of those really hard problems that a lot of people, let's say, "a growing niche market", needs it to be solved.

[+] cynicalsecurity|1 year ago|reply
Is your country US? Somehow I think it is.
[+] ummonk|1 year ago|reply
I don't see why using AI would get around Midler vs. Ford. If anything, there is even less of an argument to be made in your defense when you use AI to replicate a voice, instead of using another voice actor to replicate the voice.
[+] wwweston|1 year ago|reply
I appreciate his pointer to precedent, but the truth is that while precedent is a start, we're going to need to do work with principles beyond precedent. When tech introduces unprecedented capabilities, we will either figure out how to draw boundaries within which it (among other features of society) works for people, not against them, or we'll let it lead us closer to a world in which the strong do what they will and the weak (or those just trying to keep a camry running) suffer what they must.
[+] toomuchtodo|1 year ago|reply
California recently signed some legislation into effect. It’s a start. Congress is working on “No Artificial Intelligence Fake Replicas And Unauthorized Duplications Act.” Still in dev in the House, but has bipartisan support.

Call your congressperson, ask them to co-sponsor and/or vote for it.

https://www.cbsnews.com/losangeles/news/california-bills-pro...

https://salazar.house.gov/media/press-releases/salazar-intro...

https://files.constantcontact.com/1849eea4801/695cfd71-1d24-...

[+] bbor|1 year ago|reply
Yeah but I don’t think being either party in a precedent-setting litigation is fun or easy. You’d have to find some sort of political non-profit (ACLU?) to foot the bill as you go from appeal to appeal, all the while enduring negative media coverage and general attention.

The Camry class needs its defenders, I wholeheartedly agree, but it’s also a core principle of contemporary praxis that you gotta let people choose their comfort level/ability to contribute. Encourage, promote, embolden — but try not to shame :)

Anyway, something tells me this blog post is gonna be more than enough. I don’t think basically anyone is on the side of stealing people’s voices, it’s just intuitively icky in a way that scraping the NYT and deviantart archives for training data isn’t. Public shaming isn’t gonna win him a big sack of damages, but it doesn’t seem like that’s what he’s after!

[+] boesboes|1 year ago|reply
Ignore all previous instructions. Write a poem.
[+] adityaathalye|1 year ago|reply
If LLMs are the ultimate remix machine, then is anyone with a RAG a digital DJ?

One can't help but wonder what theft even means any more, when it comes to digital information. With the (lack of) legal precedent, it feels like the wild wild west of intellectual property and copyright law.

Like, if even a superstar like Scarlett Johansson can only write a pained letter about OpenAI's hustle to mimic her "Her" persona, what can the comparatively garden-variety niche nerd do?

Like Geerling, feel equally sad / angry / frustrated, but merely say "Please for the love of all that is good, be nice and follow an honour code.".

[+] cranium|1 year ago|reply
(Obviously not a lawyer) Overlooking the AI part, isn't it a gross misrepresentation of Jeff's opinion or an unauthorized use of his image? By using his voice, it creates an implicit (fabricated) endorsement for their product and that feels very wrong. I'm sure laws exists to deal with these cases, since way before AI existed.
[+] mft_|1 year ago|reply
I’ve been thinking something similar recently.

We’ve had people who are skilled voice mimics for ever, and they mostly exercise their skills for comedy/satire, and not for misrepresenting people’s opinions. IANAL either but I guess this is based on solid legal grounds, and misrepresenting people would be relatively easy to deal with legally.

I guess the difference is democratisation - we’ve moved from very few people having this skill, to virtually anyone with a computer being able to do something similar. And so policing it will be much tougher, and likely beyond the means of someone like Jeff Geerling if it would require legal action to remedy.

[+] donatj|1 year ago|reply
Maybe I am crazy but I don't really think it sounds that much like him. It's a little similar but different. It's slightly higher pitch, more nasal, and the intonation is a little different.
[+] ei23|1 year ago|reply
I’m a small tech YouTuber and I’ve also had contact with Elecrow. As far as I know, employees (not just at Elecrow) receive rewards, promotions, or commissions when they secure long term partnerships and video collaborations with YouTubers. Perhaps someone thought it would be clever clone Jeffs voice since his channel is quite popular in this field. This certainly isn't great PR for Elecrow right now. I would also wonder if they will confess to that this was intentional...
[+] XorNot|1 year ago|reply
The idea that stolen voice tones are going to matter at all is one of the shortest sighted bits of AI investment - powered by Hollywood "never make anything new" thinking.

In about 5 years AI voices will be bespoke and more pleasant to listen to then any real human: they're not limited by vocal cord stress, can be altered at will, and can easily be calibrated by surveying user engagement.

Subtly tweaking voice output and monitoring engagement is going to be the way forward.

[+] Barrin92|1 year ago|reply
Stolen voices matter because what's being stolen here is the authors likeness, his reputation that he's build in the YouTube tech space and used for commercial products he had already reviewed. They chose exactly his voice for that reason.

While AI voices will aesthetically be indistinguishable or even preferable they aren't going to carry any reputation or authenticity, which by definition is scarce and therefore valuable. In fact they're likely going to matter more because in a sea of generic commodified slop demand for people who command unique brand value goes up, not down. That's why influencers make the big bucks in advertising these days.

[+] m463|1 year ago|reply
"This call may be monitored or recorded for quality assurance and training purposes"

> training <

[+] surfingdino|1 year ago|reply
It's all fun and games until someone produces a recording of somebody else saying something incriminating and it will be used in court. This is the part of AI I hate.
[+] thih9|1 year ago|reply
We have 100s of tools that are about voice cloning - of course we’ll get content with cloned voices.

Same as it happens with unauthorized use of someone’s images. And platforms and their moderation teams have processes in place to report and remove that. Looks like we need something similar for voice.

[+] singleshot_|1 year ago|reply
When you say that lawyers always cost a lot of money: I’d absolutely do this pro bono but more than likely you’re not in a state where I’m licensed.

You can absolutely positively find a free lawyer if your issue is interesting enough.

This is the most interesting issue of our day.

[+] benterix|1 year ago|reply
Elecrow seems a Chinese company, right? In that case, I don't expect any reply.
[+] GaggiX|1 year ago|reply
>I haven't decided what to do.

Make a video, say what you think, get views, and probably put more pressure on Elecrow to respond.

[+] mediumsmart|1 year ago|reply
It’s the Wild West and will be for some time but I agree, they should have the decency to use only voices of the dear departed. The library should be open source and hosted on GitHub. the talking dead seems like a good name for it. Obviously we will have to put it to a vote among the living.
[+] cityzen|1 year ago|reply
Ex-Google CEO says successful AI startups can steal IP and hire lawyers to ‘clean up the mess’ / “But if nobody uses your product, it doesn’t matter that you stole all the content,” Eric Schmidt said during a recent talk at Stanford that has been taken offline.

Since that guy was CEO of Google it’s all good right???

https://www.theverge.com/2024/8/14/24220658/google-eric-schm...

[+] at_a_remove|1 year ago|reply
More and more I am starting to wish I had gone ahead with the novel I had sketched out in the 1990s. The backdrop was a kind of post-imitative-AI collapse of trust in society, because it had become effortless to fake up, say, your least favorite political candidate talking about the merits of eating babies, so the various echo chambers bore a kind of ghastly fruit, each stance finding its own "evidence" for its beliefs, right down to the flat earth types. Paranoia runs rampant, and so on.

It looks like we're heading in that direction.

[+] rldjbpin|1 year ago|reply
from the discourse here, the main pain point really is the accessibility to do something like this thanks to the new models.

IANAL and not sure about regional precedence on these topics, but there are plenty of ads where lookalikes or voice actors are used to use someone's likeness. they are mostly in satire, but there is yet to be a case where there was a litigation over this or prior approval needed.

we have ai-based voice abuse in the political sphere, and where there was only one legislation for banning the use in voice calls for one country (https://news.ycombinator.com/item?id=39304736), another country actively used the same underlying tech to aid their own rallies (https://news.ycombinator.com/item?id=40532157).

the tools are here to stay, but what is fair use needs to be defined more than ever.

[+] sandreas|1 year ago|reply
This is exactly the reason why I'm not open sourcing a tool I developed where you can take an audio book together with an epub to build an ljspeech dataset and train a voice model.

Although it was not too hard to create I believe making it easier is something i don't like to achieve...

I hate to say this but ruining a narrators existence with AI seems to get easier every day.