Joe Rogan Issues Warning After AI-Generated Version of His Podcast Surfaces

[+] NalNezumi|2 years ago|reply

Personally I love this new AI voice content booms. My favorite is the AI presidents discussing/playing/rating stuff [1], Attenborough narrating warhammer [2].

While slightly silly, I think in future people might start to take internet way less serious when you can literally make things up. Which I think is a good trend. People getting depressed or so consumed by the internet culture that they loose all interest in the real world have been a trend I've disliked. We might be just going back to the time of early, anonymous internet forums time where most people used to it just didn't take it too seriously, and a lot of communities were closed /invite only.

[1] https://youtu.be/IkaAZE_UGMo

https://youtu.be/q6ra0KDgVbg

https://youtu.be/iAq-yg72GWw

[2] https://youtu.be/X6RCLJ4pDaw

[+] neom|2 years ago|reply

After listening to this, I feel annoyed at the folks in the "AI clones teen girl’s voice in $1M kidnapping scam: ‘I’ve got your daughter’" thread that was on the front page yesterday saying "ppfftt bullshit you'd need to be an idiot to fall for an AI voice" - Guess I'm an idiot. Like it or not, generative media is getting better and better by the day.

[+] Wojtkie|2 years ago|reply

Yeah, the voice-gen software has been pretty great lately. The annoyance at folks who wanna ignore that is warranted. Their responses come off very flippant and contrarian.

[+] NumberWangMan|2 years ago|reply

Now you may want to adjust your estimations on how fast this is going, and how disruptive (and bad) it's going to be further. Exponential change is not something we're good at conceptualizing.

I really don't want to go through the singularity. I used to dream of an AI utopia, now I really wish I were born a hundred years ago instead and only had the great depression and war to look forward to.

[+] artificial|2 years ago|reply

This is also how swatting works: https://www.vice.com/en/article/k7z8be/torswats-computer-gen...

[+] ramraj07|2 years ago|reply

at this point this is now a great metric to measure originality by - if people can’t distinguish the real you from generative AI you it just means you’ve fallen into a Rut or never had much nee to say anyway.

[+] Zealotux|2 years ago|reply

Here's the "podcast": https://www.youtube.com/watch?v=meu0CoYv3z8, incredible.

[+] overthrow|2 years ago|reply

That is really well done. The only thing that gives it away (to my ears at least) is it's missing the trademark stutter/repeating words.

(Not hating on Rogan, just listen to any real clip and you'll see what I mean, e.g. https://www.youtube.com/watch?v=VvswhogSiY0)

EDIT: I listened longer and the AI actually is throwing in some "uhh"s and pauses. They don't sound completely naturally but it's still really impressive

[+] game_the0ry|2 years ago|reply

There is a need for a solution to this problem -- something like public / private key encryption or "more advanced social security" number to verify authenticity of a source via digital signature.

There is a business to be made there.

[+] lvkv|2 years ago|reply

I think you’d like a blog post I wrote in November where I put forward an outline of what such a system would look like:

https://lukas.dev/posts/how-to-trust-again/

Digital signatures, media verification, authenticity and more are all covered!

[+] putcalltheta|2 years ago|reply

AI researchers and companies need to starting asking themselves: just because we can does it mean we should?

The societal implications and risks of this technology are enormous. This could lead to an erosion of publc trust in audio and video evidence. Imagine someone creating something that gives dangerous medical advice using Fauci's likeness? Or giving the worst politicians an excuse to say that something they actually said/did was just AI generated.

In the life sciences there are limits to what researchers are allowed to do. Especially with genomics.

There are great opportunities for AI to do good things for humanity. We should not waste the time and effort on trivial things like this that have negative net benefits to us.

Edit: spelling

[+] beebmam|2 years ago|reply

This exists, and it's a joint effort across a lot of tech companies: https://en.wikipedia.org/wiki/C2PA

[+] ivanmontillam|2 years ago|reply

I'd vouch for this.

Much like you can sign PDFs with your private key certificate[0], you should be able to sign any kind of file, including audio.

Even if my voice was AI-generated, I could endorse it by signing it.

All we need are CA's jumping into this bandwagon.

--

[0]: https://www.digicert.com/kb/document-signing/how-to-sign-a-p...

[+] oh_sigh|2 years ago|reply

Why wouldn't the solution be just check the canonical source for the content?

IE go to joe rogan's spotify page and see if the content is linked there.

[+] misiti3780|2 years ago|reply

bitcoin fixes this.

[+] tourgen|2 years ago|reply

[deleted]

[+] tovej|2 years ago|reply

You mean like China's social credit system? Sounds dystopian

[+] suddenclarity|2 years ago|reply

It sounded weird at times, almost like someone reading a script with a higher pitch than his normal voice. But the technology is amazing and I would most likely be fooled by a short clip if the content wasn't completely out of place from him.

[+] rcarr|2 years ago|reply

The ironic thing about this is that it reminds me of how the podcast used to be a few years before Spotify, before it got bogged down in politics and Joe asked more interesting questions and it was just generally funner. I just listened to this for 20 odd minutes which is probably longer than I've listened to the actual podcast in the last month or two.

[+] klik99|2 years ago|reply

ChatGPT always answers questions like: Prompt: Can you tell me about <description of thing> ChatGPT: Sure, let me tell you <slightly rephrased description of thing>

It's pretty funny to hear that in Sam Altmans voice, along with umms

There's a growing number of companies working on voice - we used one recently on a game, it's not quite ready for main characters (yet!) but for background characters and rapid prototyping on main characters (that we plan on rerecording for the final assets) it's already there. It's so close, but none of them quite capture inflection, it's the stable diffusion fingers of audio AI

[+] polishdude20|2 years ago|reply

I think what gives it away is when answering questions, chat GPT first repeats the question. For example the question is: "If you were to be found in a small blue room with your favorite food, what would you do?".

The answer would start with: "if I were found in a small blue room with my favorite food I would..."

Normal people don't usually talk like that.

[+] evan_|2 years ago|reply

Television presenters and interview subjects are coached to answer questions like that, repeating the question before answering it- I wonder if that's where ChatGPT picked up that particular habit.

[+] unknown|2 years ago|reply

[deleted]

[+] aksss|2 years ago|reply

Pageant and spelling bee contestants aside..

[+] dmix|2 years ago|reply

Rogan is always the first one to be used as input/demo for this stuff.

I have a feeling it's going to remain the trend for every new AI tool.

[+] UnpossibleJim|2 years ago|reply

There's just so much data available, I don't see how it couldn't be the prevailing trend.

[+] alexb_|2 years ago|reply

This isn't a new issue. Image and likeness laws exist for a reason.

A more clear example on how this could be harmful, and how we are already equipped to deal with this - I saw an ad for a supplement that had a convincing AI Joe Rogan "talking about it on his podcast" and how it's the greatest thing that everyone needs to buy. This is illegal currently, and it's not any different from hiring a Joe Rogan impersonator to talk in a similar looking podcast set to trick people. It's why we have systems to enforce ownership over trademarks, copyrights, and your own image.

[+] chasd00|2 years ago|reply

I bet the lawyers are so happy about generative AI they can hardly count.

[+] siliconc0w|2 years ago|reply

I think we're going to need some sort of 'real human' proof system where when you record any audio or video you publish the media hash, n-second segment hashes, and signed participants to a ledger/blockchain. You could also build a tamper-proof device that you place in frame that uses a combination of a hard to get-at private hardware key and the local ambiance to produce a signature you encode a as a subtle signal that can be later be used to authenticate the video.

[+] pixl97|2 years ago|reply

Oh no, your hardware key was destroyed in a fire, you don't exist any more.

[+] birdyrooster|2 years ago|reply

Also other signals like gps, ambient radio waves, gyroscopic information, etc. whoever simulates reality better wins.

[+] jmkni|2 years ago|reply

This sounds like chaos

[+] concerned_|2 years ago|reply

The old boys club which has been outsourcing programmer for 20 years to India figured out that they can just fake it since nobody has complained yet.

And before software engineers unionize they came up with some snake oil to put us in our place, maybe try to force us back into offices, or just eliminate us.

Nobody can deny that Sam Altman and Bill Gates have been trying to "reduce costs" for a long time. The startups with devs in portugal, costa rica, Mexico, Spain, the Ukraine, India, China, anyplace where they can pay 5 dollars for a day's work.

When Bill gates said he would pay programmers 7 dollars an hour, we were all offended, we didn't realize that he was already doing it and that would be a significant raise in pay.

[+] rcarr|2 years ago|reply

This is nuts, it even gets the plosive sounds on the mic right.

[+] isoprophlex|2 years ago|reply

Fantastic execution. The generating dialogue I understand; how did they get the voices so incredibly lifelike?! Where do I start when I want text to speech like this?!

[+] httpz|2 years ago|reply

I think we now have the technology to build the talking portraits we saw in Harry Potter. Voice, facial movements, dialogs that matches the character can all be generated by AI now. I'm just not sure if this can be done realtime yet to interact with another person.

In the future, you may be able to have a conversation with a portrait of your parents even after they pass away. (So collect as much training data now?)

[+] breakpointalpha|2 years ago|reply

As weird as it sounds I had this exact thought.

I signed my parents up for Storyworth, which emails them once a week writing prompts about their life.

I'm still weighing the ethical implications of using the entire dataset 20 years from now to generate a facsimile.

Partly I just feel sad about the future, forever chasing digital ghosts of our past loved ones...

[+] bogwog|2 years ago|reply

That would be a pretty awesome installation at Universal Studios.

But I guess making it not say offensive things might be a problem… or not, depending on the character.

[+] nickthegreek|2 years ago|reply

the current workflow I've seen use is midjourney for the portrait creation, elevenlabs to do the voice and D-ID to animate the picture to the voice. Apparently the entire process can take under 15mins.

[+] maCDzP|2 years ago|reply

I listened and I think it’s really good. I wonder how much effort went in to polishing the sound.

Or did they just feed a script into some voice generator?

[+] Oarch|2 years ago|reply

The dialogue is excellent but the intonation doesn't sound quite right. The voice is a bit high as if he's tense or something.

[+] Havoc|2 years ago|reply

The one thing that I want from all this AI voice stuff is better selection of voice for the siris and alexas out there. i.e. Ability to clone some's voice onto an assistant

Some voices are just more pleasant than others & it differs by person doing the listening

[+] yazzku|2 years ago|reply

We know you want the Trump voice.

Also not just voice synthesis, but the word output should also be modeled after the character. E.g., [alarm 7:30] -> "It's 7:30, wake up you stupid son of a bitch!"

[+] sec400|2 years ago|reply

You're telling me this was fake all along? https://twitter.com/TallBart/status/1643108942627864577

[+] vannevar|2 years ago|reply

I think we'll see media personalities move to a public key encryption model, where authentic streams are encrypted in some manner with a private key to verify their origin.

[+] nanidin|2 years ago|reply

I don’t doubt it, but I do doubt that the majority of the public will make an effort to verify signatures before consuming media. I think the public will rely on trusted channels - like the personality’s .com or official YouTube channel.

[+] causi|2 years ago|reply

Personally I prefer AI Joe Rogan discussing Bionicle with AI Jordan Peterson.

https://www.youtube.com/watch?v=kVX1PB19TYE

[+] klik99|2 years ago|reply

This is hilarious because Peterson explaining Bionicle lore is not that far removed from how he rambles on about mythology

[+] js8|2 years ago|reply

I enjoyed the Factorio one.

171 comments