Show HN: Supertone Shift – AI powered Real-time voice changer

watersb|1 year ago

Very interesting!

I would like some clarity on the Terms of Service clause 4:

> The content created using Supertone Shift remains your property. However, by using our Services, you grant Supertone a worldwide, non-exclusive, royalty-free license to use, reproduce, adapt, and display content solely for the purpose of operating and improving Supertone Shift. This license does not grant Supertone any rights to sell or distribute your content.

Does Supertone Shift need the user content in order to further improve the product during the beta period?

Or does it need the user content in normal operation (for example, running the conversion on remote servers vs local processing)?

I can see some hesitation from people if you're recording everything they say, and keeping that recording for an indefinite period of time.

I can appreciate that there may be a problem enforcing a "Don't use our product for evil" clause, if you can review usage.

The challenge here seems overwhelming.

weinzierl|1 year ago

The phrasing is pretty standard, the important part is the middle sentence. Often it includes irrevocable, transferable and sublicensable as well.

That being said, I hate "remains your property" part. It's just fluff that changes nothing, but distracts from the following sentence.

htrp|1 year ago

Looks like facebook's ToS,

we may need your data for some unspecified purpose ("AI model training") that we can't even dream of right now, so we'll just take all the rights

echelon|1 year ago

There are dozens of other products in this category, including completely open source ones you can fine tune.

Commercial applications like Voice.Ai and Koe are real time and have celebrity and anime voices respectively.

The RVC ecosystem on GitHub has dozens of different real time open source voice changers. I haven't kept up with the SOTA, but they're incredible, fine tunable, and 100% local.

https://voice.ai

https://koe.ai

https://m.youtube.com/watch?v=zkaBK5erB2c

IndySun|1 year ago

You don't need to look far to understand those terms are standard, and by 'standard' see non-binding, or broad, it doesnt matter what they 'say' here because you will only find supertone abusing these 'terms' if someone at supertone lets you know - meanwhile your voice is syphoned off and used in anyway their friends see fit, and no terms laid out here will be broken. As per other replies for standard software terms, see duplicitous.

unknown|1 year ago

[deleted]

autoexecbat|1 year ago

It could atleast have a time limit

fzaninotto|1 year ago

Why does the Mac installer require admin right and a restart? Giving admin rights to an installer requires trust in the vendor. Supertone Shift is just a newborn. I cancelled the installation because of that.

I would love to test the technology without the risk of damaging my computer!

desro|1 year ago

I use the great, free, "Suspicious Package" app [0] to inspect installers like these.

In fact, it was Supertone Shift's installer that prodded me to seek it out (I happened to find and install Shift a couple of weeks ago).

In this case, it needs admin permissions to install to `/Library/Application Support` as well as `/Library/Audio`.

It needs to restart in order for the HAL driver to be loaded (this provides the virtual audio interface for using the app with Teams, Zoom, etc.)

The preinstall/postinstall scripts simply handle the app's directory in Application Support.

I decided it was safe enough, and had some fun playing with it. It contacts what it claims are licensing servers (when it starts), and won't start without it. It wanted to keep contacting those servers constantly, but blocking its network access via Little Snitch didn't prevent it from functioning. The network traffic was in the single-digit kilobyte range, so I felt reasonably confident no audio data was being looted.

[0] https://mothersruin.com/software/SuspiciousPackage/

moralestapia|1 year ago

Thanks for this, I was very eager to try it out but this is a always a deal breaker.

michaelmior|1 year ago

This seems really cool and I can see some great use cases. But the marketing is very odd to me. It says it will let me express myself in a voice that is truly my own…but I can already do that with my natural voice. That seems more likely to be unique than what I would get by adjusting it in software.

themoonisachees|1 year ago

I guess the wording is awkward, but as a trans person, I still resonate with it. I'm acutely aware it's not going to be "my voice", but neither is the one I have right now.

corytheboyd|1 year ago

Outside of the trans use-case mentioned here, I could imagine some women gamers getting value out of this too. You kinda need voice comms to play some games properly, and not wanting to reveal yourself as a woman online, especially over voip, is completely reasonable. Because gamers are terrible. Something like this could make hiding that trivial, assuming the latency is accurate (would need to be very fast in some games)

sdfgtr|1 year ago

That particular line is definitely directed towards people with gender identity issues.

itishappy|1 year ago

Salesperson: You test drive any car on the lot!

You: Why? I already own a 2002 Ford Escape...

I'm not trying to make fun of you, I think you actually have a unique and impressive perspective! I've always hated hearing my voice on answering machines, so if I could choose any voice I'd choose Chris Cornell or Morgan Freeman.

idiotsecant|1 year ago

Pro tip: Some people do not consider their natural voice 'their' voice.

trashcluster|1 year ago

If it was compatible as a VST plugin for DAWs it would be even more useful than a standalone software. From skimming through the website it seems that Supertone already make a VST plugin so it may be a matter of time before Shift becomes a VST too.

hollowayaegis|1 year ago

Self plug, but I've been developing a local AI voice changing VST [1] (bring your own RVC models, or use builtins). It works in DAWs in realtime on modern macs.

[1] https://audio.sunflower.industries

jl6|1 year ago

Would it be possible to embed a watermark in the generated audio? Many people will use voice changing tech for honest purposes, but there will always be those acting to ruin it for the rest of us. There are just too many scenarios where faking your voice confers an illicit benefit.

I know watermarks are never foolproof, but they may deter casual misuse.

terhechte|1 year ago

Curious Question: Given the low latency, does it run the computation on device or over the network? If on device, are there minimum CPU requirements?

catapart|1 year ago

Very interested in this answer! I'd really like to see it on the website for any AI I'm considering. It's an entirely different proposition as to whether you're getting a utility or a service.

rcarmo|1 year ago

I can see this being interesting for gamers and more whimsical pursuits, but I'm more curious about neural speech synthesis for both normal speech and singing--the first because there is a pretty strong demand for automated narration of training videos, and the second because of my music hobby--other than vocaloids and a few niche DAWs, I haven't found any nice Open Source tooling for the latter (the former I can mostly do with XTTSv2).

edwcross|1 year ago

From what I found, XTTSv2 is based on the Coqui Public Model License, which explicits disallows commercial commercial usage: "This license allows only non-commercial use of a machine learning model and its outputs."

So, from what I understand, I cannot use it and then upload the training video to Youtube. Or can I?

bogwog|1 year ago

Seems like we're getting closer and closer to Star Trek's universal translator

jzemeocala|1 year ago

Fun looking product. Sad to see no Linux support (yet?).

Would you be interested in any help porting/maintaining a Linux release?

tiborsaas|1 year ago

This looks like an amazing tool for indie game developers. Even musicians could find this an amazing help to add some unique tones.

andoando|1 year ago

I was trying to make this myself earlier but every single AI model I found used something like 50% of my CPU or GPU.

Any idea how this is possible? Voicemod does something similar and I couldn't figure it out. Is it actually AI or is this just shifting pitch/reverb/etc

drivingmenuts|1 year ago

Weren't we able to do this before AI? I'm not sure I get what AI is bringing to the table/value-adding for this particular technology, except marketing hype.

cma|1 year ago

Wasn't that very basic pitch shifting only?

jen729w|1 year ago

> The installation has completed. Please restart your Mac.

Seriously?

giankam|1 year ago

Not only, it's not possible to quit the installer. Had to kill it and then look for changes done to the system. Hope I've been able to find them all but really upsetting.

jen729w|1 year ago

So, Supertone Shift creators: this is really good! The first time you hear your own voice as a K-pop star or a nymph it’s genuinely startling.

Just improve the installer so I don’t feel like I’ve been scammed by malware!

earthnail|1 year ago

Same question. What did I just install that required a restart?

simse|1 year ago

It's worth it!

itronitron|1 year ago

I wonder if this could be applied to educational videos to make the material seem less challenging for children.

camillomiller|1 year ago

Except for purely non lucrative entertainment use cases with a very high novelty factor, I am struggling to see productive use cases for all these AI applications that don't involve some form of deception or at best disingenuous marketing.

dannyw|1 year ago

As someone who makes indie games as a passion and creative outlet, tools like these drastically expand my creative possibilities.

jack_pp|1 year ago

I think this is huge for new content creators that are not native speakers to get rid of the accent. Also if it enables multiple people to sound the same then you can have a YouTube channel with a larger team but only one voice

phil-martin|1 year ago

I can think of a few applications of this technology, although some may fall into the deception category, albeit harmless in my view:

- overcoming social anxiety in voice or online calls. It doesn’t take very many bullying incidents during childhood to become convinced you have a horrible or weird voice. I can see this being used as a useful tool to make people feel more comfortable by having a different voice

- amateur interactive fiction development. Having your characters have a real voice in a game in response too the players commands is a real need, and being able to record it yourself and be a different character would be a huge enabler of creating something for a solo developer.

- internal HR videos/podcasts. Creating these can be very expensive, needing different persons reading out dialogue could significantly reduce the effort in recording and producing these

- another instrument for music creators. Auto tune is a very common tool for music production for all skill levels, and this could be applied in a very similar way

It no doubt can be used for disingenuous purposes, any technology can. But these can be real life improving tools enabling many people to do things they never thought possible.

The idea of participating in Q&A session in a webinar would be far too confronting and inconceivable for many people, but to be able to do it semi-anonymously with a different voice would eliminate much of the anxiety preventing them

unknown|1 year ago

[deleted]

kthartic|1 year ago

This is huge for indie game developers! They can voice every line of dialogue for every character themselves (or with just 1 professional voice actor).

Text-to-speech AI voice generators exist, but you don't have fine control over the emotion/expressiveness/intonation of the lines like you do with this approach.

slipheen|1 year ago

Would imagine the same sort of reasons people do v tubing in general, such as safety and anonymity.

nounaut|1 year ago

If they generate good quality then I suppose voice acting could have good use of it.

rtcode_io|1 year ago

Nice to see a venture from South Korea!

gardenhedge|1 year ago

This is awesome. Very futuristic

darkoob12|1 year ago

more dystopian. yet another "contribution" of AI for destroying the society via misinformation.

WORMS_EAT_WORMS|1 year ago

Congrats! This is amazing work

unknown|1 year ago

[deleted]

vouaobrasil|1 year ago

All this technology is leading to a world where we can present second-life/alternative identities cohesively online. I wonder if this is going to cause a global decline in the ability for people to express themselves, since it is now so easy to create an identity online that is different than your real-life identity.

I think it's rather sad. Yes, there are some fringe use-cases perhaps but I think this is the wrong direction for humanity. We should find more value in what we already have rather than inventing arbitrary things like this to hide away from real acceptance of ourselves.

latexr|1 year ago

It will first lead to a world where fake videos of celebrities will be used to scam you, and your own voice will be used to scam your relatives. Both of those are happening today.

Ironically, this will lead to a work where we need to use these fake personas online to not have our lives messed with offline.

I don’t fully agree with your first paragraph, but I do agree with the second one.

unknown|1 year ago

[deleted]

unknown|1 year ago

[deleted]

kunley|1 year ago

[deleted]

jen729w|1 year ago

I make videos, it might be handy to be able to ‘be’ someone else. I can for sure see a use for this.

hagbard_c|1 year ago

...needing an excuse to get access to microphones to solve the problem stated on their own front page (https://supertone.ai/) as We need voice source material to train the AI.

kunley|1 year ago

Funny to see myself being downvoted for a harmless call for a reason, so, few more comments:

The fact someone did something and put a substantial effort into it is not a reason good enough to justify said effort (other than the benefits of learrning) and the product that was created. The world is full of things which actually made it worse place.

Another comment is actually a meta-comment and might be shocking to some people here:

downvoting is not a good method of making someone stop saying statements perceived by some as unconfortable. In fact, there is nothing wrong with earning points first and then burning them with saying comments that are feared by certain individuals to the level that they "must" be downvoted...

98 comments