top | item 38542591

Ask HN: AI Voice Reverse

3 points| loregate | 2 years ago

Would it be possible to reverse a AI generated voice if they spoke themselves[0] instead of using TTS[1]?

Since the AI voice is trained shouldn't a reversing AI also be able to seperate the trained data?

[0] https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI [1] https://elevenlabs.io/speech-synthesis

5 comments

jruohonen|2 years ago

Can you elaborate the question?

TTS is just text-to-speech, and there is a huge amount of algorithms and tools for speech-to-text (STT).

loregate|2 years ago

With TTS this wouldn't be possible, but I read a news headline[0] and wondered whether you could reverse this back to the kidnapper voice.

[0] https://www.theguardian.com/us-news/2023/jun/14/ai-kidnappin...

petercooper|2 years ago

(I spent too much time to listen to talk radio in the 90s to understand this question at first. It's not about temporal reversal, but trying to reveal the original audio behind an AI enhanced/transcoded voice.)

I think so. There's a whole field of voice biometrics working in this area. I've experimented with such tools and you have to work hard to copy someone's vocabulary, timing, and cadence. If you speak or sing in your normal voice and convert it, there are huge tells, somewhat akin to those used in stylometry to identify the owners of sock accounts (indeed, if someone actually used TTS, it mostly becomes a stylometry problem, unless services like ElevenLabs were to add inaudible watermarks or something).

upwardbound|2 years ago

Should be possible in principle, although you're talking about a lot of R&D work to turn this high-level idea into something practical. What do you see as the advantage of this approach / what use case did you have in mind?