Personally I love this new AI voice content booms. My favorite is the AI presidents discussing/playing/rating stuff [1], Attenborough narrating warhammer [2].
While slightly silly, I think in future people might start to take internet way less serious when you can literally make things up. Which I think is a good trend. People getting depressed or so consumed by the internet culture that they loose all interest in the real world have been a trend I've disliked. We might be just going back to the time of early, anonymous internet forums time where most people used to it just didn't take it too seriously, and a lot of communities were closed /invite only.
After listening to this, I feel annoyed at the folks in the "AI clones teen girl’s voice in $1M kidnapping scam: ‘I’ve got your daughter’" thread that was on the front page yesterday saying "ppfftt bullshit you'd need to be an idiot to fall for an AI voice" - Guess I'm an idiot. Like it or not, generative media is getting better and better by the day.
Yeah, the voice-gen software has been pretty great lately. The annoyance at folks who wanna ignore that is warranted. Their responses come off very flippant and contrarian.
Now you may want to adjust your estimations on how fast this is going, and how disruptive (and bad) it's going to be further. Exponential change is not something we're good at conceptualizing.
I really don't want to go through the singularity. I used to dream of an AI utopia, now I really wish I were born a hundred years ago instead and only had the great depression and war to look forward to.
at this point this is now a great metric to measure originality by - if people can’t distinguish the real you from generative AI you it just means you’ve fallen into a Rut or never had much nee to say anyway.
EDIT: I listened longer and the AI actually is throwing in some "uhh"s and pauses. They don't sound completely naturally but it's still really impressive
There is a need for a solution to this problem -- something like public / private key encryption or "more advanced social security" number to verify authenticity of a source via digital signature.
AI researchers and companies need to starting asking themselves: just because we can does it mean we should?
The societal implications and risks of this technology are enormous. This could lead to an erosion of publc trust in audio and video evidence. Imagine someone creating something that gives dangerous medical advice using Fauci's likeness? Or giving the worst politicians an excuse to say that something they actually said/did was just AI generated.
In the life sciences there are limits to what researchers are allowed to do. Especially with genomics.
There are great opportunities for AI to do good things for humanity. We should not waste the time and effort on trivial things like this that have negative net benefits to us.
It sounded weird at times, almost like someone reading a script with a higher pitch than his normal voice. But the technology is amazing and I would most likely be fooled by a short clip if the content wasn't completely out of place from him.
The ironic thing about this is that it reminds me of how the podcast used to be a few years before Spotify, before it got bogged down in politics and Joe asked more interesting questions and it was just generally funner. I just listened to this for 20 odd minutes which is probably longer than I've listened to the actual podcast in the last month or two.
ChatGPT always answers questions like:
Prompt: Can you tell me about <description of thing>
ChatGPT: Sure, let me tell you <slightly rephrased description of thing>
It's pretty funny to hear that in Sam Altmans voice, along with umms
There's a growing number of companies working on voice - we used one recently on a game, it's not quite ready for main characters (yet!) but for background characters and rapid prototyping on main characters (that we plan on rerecording for the final assets) it's already there. It's so close, but none of them quite capture inflection, it's the stable diffusion fingers of audio AI
I think what gives it away is when answering questions, chat GPT first repeats the question.
For example the question is:
"If you were to be found in a small blue room with your favorite food, what would you do?".
The answer would start with:
"if I were found in a small blue room with my favorite food I would..."
Television presenters and interview subjects are coached to answer questions like that, repeating the question before answering it- I wonder if that's where ChatGPT picked up that particular habit.
This isn't a new issue. Image and likeness laws exist for a reason.
A more clear example on how this could be harmful, and how we are already equipped to deal with this - I saw an ad for a supplement that had a convincing AI Joe Rogan "talking about it on his podcast" and how it's the greatest thing that everyone needs to buy. This is illegal currently, and it's not any different from hiring a Joe Rogan impersonator to talk in a similar looking podcast set to trick people. It's why we have systems to enforce ownership over trademarks, copyrights, and your own image.
I think we're going to need some sort of 'real human' proof system where when you record any audio or video you publish the media hash, n-second segment hashes, and signed participants to a ledger/blockchain. You could also build a tamper-proof device that you place in frame that uses a combination of a hard to get-at private hardware key and the local ambiance to produce a signature you encode a as a subtle signal that can be later be used to authenticate the video.
The old boys club which has been outsourcing programmer for 20 years to India figured out that they can just fake it since nobody has complained yet.
And before software engineers unionize they came up with some snake oil to put us in our place, maybe try to force us back into offices, or just eliminate us.
Nobody can deny that Sam Altman and Bill Gates have been trying to "reduce costs" for a long time. The startups with devs in portugal, costa rica, Mexico, Spain, the Ukraine, India, China, anyplace where they can pay 5 dollars for a day's work.
When Bill gates said he would pay programmers 7 dollars an hour, we were all offended, we didn't realize that he was already doing it and that would be a significant raise in pay.
Fantastic execution. The generating dialogue I understand; how did they get the voices so incredibly lifelike?! Where do I start when I want text to speech like this?!
I think we now have the technology to build the talking portraits we saw in Harry Potter.
Voice, facial movements, dialogs that matches the character can all be generated by AI now.
I'm just not sure if this can be done realtime yet to interact with another person.
In the future, you may be able to have a conversation with a portrait of your parents even after they pass away. (So collect as much training data now?)
the current workflow I've seen use is midjourney for the portrait creation, elevenlabs to do the voice and D-ID to animate the picture to the voice. Apparently the entire process can take under 15mins.
The one thing that I want from all this AI voice stuff is better selection of voice for the siris and alexas out there. i.e. Ability to clone some's voice onto an assistant
Some voices are just more pleasant than others & it differs by person doing the listening
Also not just voice synthesis, but the word output should also be modeled after the character. E.g., [alarm 7:30] -> "It's 7:30, wake up you stupid son of a bitch!"
I think we'll see media personalities move to a public key encryption model, where authentic streams are encrypted in some manner with a private key to verify their origin.
I don’t doubt it, but I do doubt that the majority of the public will make an effort to verify signatures before consuming media. I think the public will rely on trusted channels - like the personality’s .com or official YouTube channel.
[+] [-] NalNezumi|2 years ago|reply
While slightly silly, I think in future people might start to take internet way less serious when you can literally make things up. Which I think is a good trend. People getting depressed or so consumed by the internet culture that they loose all interest in the real world have been a trend I've disliked. We might be just going back to the time of early, anonymous internet forums time where most people used to it just didn't take it too seriously, and a lot of communities were closed /invite only.
[1] https://youtu.be/IkaAZE_UGMo
https://youtu.be/q6ra0KDgVbg
https://youtu.be/iAq-yg72GWw
[2] https://youtu.be/X6RCLJ4pDaw
[+] [-] neom|2 years ago|reply
[+] [-] Wojtkie|2 years ago|reply
[+] [-] NumberWangMan|2 years ago|reply
I really don't want to go through the singularity. I used to dream of an AI utopia, now I really wish I were born a hundred years ago instead and only had the great depression and war to look forward to.
[+] [-] artificial|2 years ago|reply
[+] [-] ramraj07|2 years ago|reply
[+] [-] Zealotux|2 years ago|reply
[+] [-] overthrow|2 years ago|reply
(Not hating on Rogan, just listen to any real clip and you'll see what I mean, e.g. https://www.youtube.com/watch?v=VvswhogSiY0)
EDIT: I listened longer and the AI actually is throwing in some "uhh"s and pauses. They don't sound completely naturally but it's still really impressive
[+] [-] game_the0ry|2 years ago|reply
There is a business to be made there.
[+] [-] lvkv|2 years ago|reply
https://lukas.dev/posts/how-to-trust-again/
Digital signatures, media verification, authenticity and more are all covered!
[+] [-] putcalltheta|2 years ago|reply
The societal implications and risks of this technology are enormous. This could lead to an erosion of publc trust in audio and video evidence. Imagine someone creating something that gives dangerous medical advice using Fauci's likeness? Or giving the worst politicians an excuse to say that something they actually said/did was just AI generated.
In the life sciences there are limits to what researchers are allowed to do. Especially with genomics.
There are great opportunities for AI to do good things for humanity. We should not waste the time and effort on trivial things like this that have negative net benefits to us.
Edit: spelling
[+] [-] beebmam|2 years ago|reply
[+] [-] ivanmontillam|2 years ago|reply
Much like you can sign PDFs with your private key certificate[0], you should be able to sign any kind of file, including audio.
Even if my voice was AI-generated, I could endorse it by signing it.
All we need are CA's jumping into this bandwagon.
--
[0]: https://www.digicert.com/kb/document-signing/how-to-sign-a-p...
[+] [-] oh_sigh|2 years ago|reply
IE go to joe rogan's spotify page and see if the content is linked there.
[+] [-] misiti3780|2 years ago|reply
[+] [-] tourgen|2 years ago|reply
[deleted]
[+] [-] tovej|2 years ago|reply
[+] [-] suddenclarity|2 years ago|reply
[+] [-] rcarr|2 years ago|reply
[+] [-] klik99|2 years ago|reply
It's pretty funny to hear that in Sam Altmans voice, along with umms
There's a growing number of companies working on voice - we used one recently on a game, it's not quite ready for main characters (yet!) but for background characters and rapid prototyping on main characters (that we plan on rerecording for the final assets) it's already there. It's so close, but none of them quite capture inflection, it's the stable diffusion fingers of audio AI
[+] [-] polishdude20|2 years ago|reply
The answer would start with: "if I were found in a small blue room with my favorite food I would..."
Normal people don't usually talk like that.
[+] [-] evan_|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] aksss|2 years ago|reply
[+] [-] dmix|2 years ago|reply
I have a feeling it's going to remain the trend for every new AI tool.
[+] [-] UnpossibleJim|2 years ago|reply
[+] [-] alexb_|2 years ago|reply
A more clear example on how this could be harmful, and how we are already equipped to deal with this - I saw an ad for a supplement that had a convincing AI Joe Rogan "talking about it on his podcast" and how it's the greatest thing that everyone needs to buy. This is illegal currently, and it's not any different from hiring a Joe Rogan impersonator to talk in a similar looking podcast set to trick people. It's why we have systems to enforce ownership over trademarks, copyrights, and your own image.
[+] [-] chasd00|2 years ago|reply
[+] [-] siliconc0w|2 years ago|reply
[+] [-] pixl97|2 years ago|reply
[+] [-] birdyrooster|2 years ago|reply
[+] [-] jmkni|2 years ago|reply
[+] [-] concerned_|2 years ago|reply
And before software engineers unionize they came up with some snake oil to put us in our place, maybe try to force us back into offices, or just eliminate us.
Nobody can deny that Sam Altman and Bill Gates have been trying to "reduce costs" for a long time. The startups with devs in portugal, costa rica, Mexico, Spain, the Ukraine, India, China, anyplace where they can pay 5 dollars for a day's work.
When Bill gates said he would pay programmers 7 dollars an hour, we were all offended, we didn't realize that he was already doing it and that would be a significant raise in pay.
[+] [-] rcarr|2 years ago|reply
[+] [-] isoprophlex|2 years ago|reply
[+] [-] httpz|2 years ago|reply
In the future, you may be able to have a conversation with a portrait of your parents even after they pass away. (So collect as much training data now?)
[+] [-] breakpointalpha|2 years ago|reply
I signed my parents up for Storyworth, which emails them once a week writing prompts about their life.
I'm still weighing the ethical implications of using the entire dataset 20 years from now to generate a facsimile.
Partly I just feel sad about the future, forever chasing digital ghosts of our past loved ones...
[+] [-] bogwog|2 years ago|reply
But I guess making it not say offensive things might be a problem… or not, depending on the character.
[+] [-] nickthegreek|2 years ago|reply
[+] [-] maCDzP|2 years ago|reply
Or did they just feed a script into some voice generator?
[+] [-] Oarch|2 years ago|reply
[+] [-] Havoc|2 years ago|reply
Some voices are just more pleasant than others & it differs by person doing the listening
[+] [-] yazzku|2 years ago|reply
Also not just voice synthesis, but the word output should also be modeled after the character. E.g., [alarm 7:30] -> "It's 7:30, wake up you stupid son of a bitch!"
[+] [-] sec400|2 years ago|reply
[+] [-] vannevar|2 years ago|reply
[+] [-] nanidin|2 years ago|reply
[+] [-] causi|2 years ago|reply
https://www.youtube.com/watch?v=kVX1PB19TYE
[+] [-] klik99|2 years ago|reply
[+] [-] js8|2 years ago|reply