top | item 38302523

(no title)

belugacat | 2 years ago

Right now this requires API tokens and being dependent on third party companies that will cut off your access if they decide they don’t like you.

The moment these models can run locally on the kind of cheap hardware that phone scam operations have will be the real Pandora’s box moment. (I give it 3-5 years or so)

discuss

fotcorn|2 years ago

The recently released XTTS-v2 model[0] from coqui.ai is coming very close to what ElevenLabs[1] can do. It runs reasonably fast on a recent GPU, and should also work on CPU. Requires a 3 second (!) clip of the voice you want to clone. License does not allow commercial use.

0: https://huggingface.co/coqui/XTTS-v2

1: https://elevenlabs.io/

alpaca128|2 years ago

> Requires a 3 second (!) clip of the voice you want to clone.

Sure, if you want a guaranteed uncanny valley experience. There is no way a few seconds are enough to cover all the ways a specific person pronounces things. A person's voice is much more than just the pitch and with a 3 second sample anyone who knows them will be able to tell something's off within 3 seconds.

Ukv|2 years ago

Could work for spear-phishing, or impersonating a widely-known trusted figure. I can't really see it working for cold-calls that pretend to be someone the victim knows (like the terrifying ransom calls), since the operations work at a huge scale expecting most people to not even pick up a "scam likely" call. Even if model tuning is free and instant, just having to find a voice clip of the person prior to each unanswered automated call would tank the quantity they're able to make.

Though, for the same reason websites always attribute data breaches to a "highly sophisticated targetted attack", I imagine there will be some unevidenced claims that this is what scammers did to them - people don't want to have been fooled by something simple.

no_time|2 years ago

I just can't get excited over most trending submissions here on HN for the exact same reason over the last 2 years. This current advancement of AI doesn't feel like such a "new frontier" as other advancements in tech at their early adopter phase. The internet (the closest tech invention of similar magnitude imo) had an atleast decade long "wild west" before all the big players we know entrenched themselves with monopolies and legislation.

With AI we barely begun and yet the cards are already dealt.

agloe_dreams|2 years ago

I kinda wonder how close you could get with Core ML on iOS. Apple already ships an iffy voice clone software.