Supertone's Shift offers real-time voice changing technology. It lets users immediately switch to any selected voice. Just pick a voice and begin speaking. Shift is suited for VTubers, content creators, and gamers, as well as anyone who wishes to accurately express their chosen persona's voice. Try out Supertone Shift now.
>> https://product.supertone.ai/shift
I would like some clarity on the Terms of Service clause 4:
> The content created using Supertone Shift remains your property. However, by using our Services, you grant Supertone a worldwide, non-exclusive, royalty-free license to use, reproduce, adapt, and display content solely for the purpose of operating and improving Supertone Shift. This license does not grant Supertone any rights to sell or distribute your content.
Does Supertone Shift need the user content in order to further improve the product during the beta period?
Or does it need the user content in normal operation (for example, running the conversion on remote servers vs local processing)?
I can see some hesitation from people if you're recording everything they say, and keeping that recording for an indefinite period of time.
I can appreciate that there may be a problem enforcing a "Don't use our product for evil" clause, if you can review usage.
There are dozens of other products in this category, including completely open source ones you can fine tune.
Commercial applications like Voice.Ai and Koe are real time and have celebrity and anime voices respectively.
The RVC ecosystem on GitHub has dozens of different real time open source voice changers. I haven't kept up with the SOTA, but they're incredible, fine tunable, and 100% local.
You don't need to look far to understand those terms are standard, and by 'standard' see non-binding, or broad, it doesnt matter what they 'say' here because you will only find supertone abusing these 'terms' if someone at supertone lets you know - meanwhile your voice is syphoned off and used in anyway their friends see fit, and no terms laid out here will be broken. As per other replies for standard software terms, see duplicitous.
Why does the Mac installer require admin right and a restart? Giving admin rights to an installer requires trust in the vendor. Supertone Shift is just a newborn. I cancelled the installation because of that.
I would love to test the technology without the risk of damaging my computer!
I use the great, free, "Suspicious Package" app [0] to inspect installers like these.
In fact, it was Supertone Shift's installer that prodded me to seek it out (I happened to find and install Shift a couple of weeks ago).
In this case, it needs admin permissions to install to `/Library/Application Support` as well as `/Library/Audio`.
It needs to restart in order for the HAL driver to be loaded (this provides the virtual audio interface for using the app with Teams, Zoom, etc.)
The preinstall/postinstall scripts simply handle the app's directory in Application Support.
I decided it was safe enough, and had some fun playing with it. It contacts what it claims are licensing servers (when it starts), and won't start without it. It wanted to keep contacting those servers constantly, but blocking its network access via Little Snitch didn't prevent it from functioning. The network traffic was in the single-digit kilobyte range, so I felt reasonably confident no audio data was being looted.
This seems really cool and I can see some great use cases. But the marketing is very odd to me. It says it will let me express myself in a voice that is truly my own…but I can already do that with my natural voice. That seems more likely to be unique than what I would get by adjusting it in software.
I guess the wording is awkward, but as a trans person, I still resonate with it. I'm acutely aware it's not going to be "my voice", but neither is the one I have right now.
Outside of the trans use-case mentioned here, I could imagine some women gamers getting value out of this too. You kinda need voice comms to play some games properly, and not wanting to reveal yourself as a woman online, especially over voip, is completely reasonable. Because gamers are terrible. Something like this could make hiding that trivial, assuming the latency is accurate (would need to be very fast in some games)
I'm not trying to make fun of you, I think you actually have a unique and impressive perspective! I've always hated hearing my voice on answering machines, so if I could choose any voice I'd choose Chris Cornell or Morgan Freeman.
If it was compatible as a VST plugin for DAWs it would be even more useful than a standalone software. From skimming through the website it seems that Supertone already make a VST plugin so it may be a matter of time before Shift becomes a VST too.
Self plug, but I've been developing a local AI voice changing VST [1] (bring your own RVC models, or use builtins). It works in DAWs in realtime on modern macs.
Would it be possible to embed a watermark in the generated audio? Many people will use voice changing tech for honest purposes, but there will always be those acting to ruin it for the rest of us. There are just too many scenarios where faking your voice confers an illicit benefit.
I know watermarks are never foolproof, but they may deter casual misuse.
Very interested in this answer! I'd really like to see it on the website for any AI I'm considering. It's an entirely different proposition as to whether you're getting a utility or a service.
I can see this being interesting for gamers and more whimsical pursuits, but I'm more curious about neural speech synthesis for both normal speech and singing--the first because there is a pretty strong demand for automated narration of training videos, and the second because of my music hobby--other than vocaloids and a few niche DAWs, I haven't found any nice Open Source tooling for the latter (the former I can mostly do with XTTSv2).
From what I found, XTTSv2 is based on the Coqui Public Model License, which explicits disallows commercial commercial usage: "This license allows only non-commercial use of a machine learning model and its outputs."
So, from what I understand, I cannot use it and then upload the training video to Youtube. Or can I?
I was trying to make this myself earlier but every single AI model I found used something like 50% of my CPU or GPU.
Any idea how this is possible? Voicemod does something similar and I couldn't figure it out. Is it actually AI or is this just shifting pitch/reverb/etc
Weren't we able to do this before AI? I'm not sure I get what AI is bringing to the table/value-adding for this particular technology, except marketing hype.
Not only, it's not possible to quit the installer. Had to kill it and then look for changes done to the system. Hope I've been able to find them all but really upsetting.
Except for purely non lucrative entertainment use cases with a very high novelty factor, I am struggling to see productive use cases for all these AI applications that don't involve some form of deception or at best disingenuous marketing.
I think this is huge for new content creators that are not native speakers to get rid of the accent. Also if it enables multiple people to sound the same then you can have a YouTube channel with a larger team but only one voice
I can think of a few applications of this technology, although some may fall into the deception category, albeit harmless in my view:
- overcoming social anxiety in voice or online calls. It doesn’t take very many bullying incidents during childhood to become convinced you have a horrible or weird voice. I can see this being used as a useful tool to make people feel more comfortable by having a different voice
- amateur interactive fiction development. Having your characters have a real voice in a game in response too the players commands is a real need, and being able to record it yourself and be a different character would be a huge enabler of creating something for a solo developer.
- internal HR videos/podcasts. Creating these can be very expensive, needing different persons reading out dialogue could significantly reduce the effort in recording and producing these
- another instrument for music creators. Auto tune is a very common tool for music production for all skill levels, and this could be applied in a very similar way
It no doubt can be used for disingenuous purposes, any technology can. But these can be real life improving tools enabling many people to do things they never thought possible.
The idea of participating in Q&A session in a webinar would be far too confronting and inconceivable for many people, but to be able to do it semi-anonymously with a different voice would eliminate much of the anxiety preventing them
This is huge for indie game developers! They can voice every line of dialogue for every character themselves (or with just 1 professional voice actor).
Text-to-speech AI voice generators exist, but you don't have fine control over the emotion/expressiveness/intonation of the lines like you do with this approach.
All this technology is leading to a world where we can present second-life/alternative identities cohesively online. I wonder if this is going to cause a global decline in the ability for people to express themselves, since it is now so easy to create an identity online that is different than your real-life identity.
I think it's rather sad. Yes, there are some fringe use-cases perhaps but I think this is the wrong direction for humanity. We should find more value in what we already have rather than inventing arbitrary things like this to hide away from real acceptance of ourselves.
It will first lead to a world where fake videos of celebrities will be used to scam you, and your own voice will be used to scam your relatives. Both of those are happening today.
Ironically, this will lead to a work where we need to use these fake personas online to not have our lives messed with offline.
I don’t fully agree with your first paragraph, but I do agree with the second one.
...needing an excuse to get access to microphones to solve the problem stated on their own front page (https://supertone.ai/) as We need voice source material to train the AI.
Funny to see myself being downvoted for a harmless call for a reason, so, few more comments:
The fact someone did something and put a substantial effort into it is not a reason good enough to justify said effort (other than the benefits of learrning) and the product that was created. The world is full of things which actually made it worse place.
Another comment is actually a meta-comment and might be shocking to some people here:
downvoting is not a good method of making someone stop saying statements perceived by some as unconfortable. In fact, there is nothing wrong with earning points first and then burning them with saying comments that are feared by certain individuals to the level that they "must" be downvoted...
watersb|1 year ago
I would like some clarity on the Terms of Service clause 4:
> The content created using Supertone Shift remains your property. However, by using our Services, you grant Supertone a worldwide, non-exclusive, royalty-free license to use, reproduce, adapt, and display content solely for the purpose of operating and improving Supertone Shift. This license does not grant Supertone any rights to sell or distribute your content.
Does Supertone Shift need the user content in order to further improve the product during the beta period?
Or does it need the user content in normal operation (for example, running the conversion on remote servers vs local processing)?
I can see some hesitation from people if you're recording everything they say, and keeping that recording for an indefinite period of time.
I can appreciate that there may be a problem enforcing a "Don't use our product for evil" clause, if you can review usage.
The challenge here seems overwhelming.
weinzierl|1 year ago
That being said, I hate "remains your property" part. It's just fluff that changes nothing, but distracts from the following sentence.
htrp|1 year ago
we may need your data for some unspecified purpose ("AI model training") that we can't even dream of right now, so we'll just take all the rights
echelon|1 year ago
Commercial applications like Voice.Ai and Koe are real time and have celebrity and anime voices respectively.
The RVC ecosystem on GitHub has dozens of different real time open source voice changers. I haven't kept up with the SOTA, but they're incredible, fine tunable, and 100% local.
https://voice.ai
https://koe.ai
https://m.youtube.com/watch?v=zkaBK5erB2c
IndySun|1 year ago
unknown|1 year ago
[deleted]
autoexecbat|1 year ago
fzaninotto|1 year ago
I would love to test the technology without the risk of damaging my computer!
desro|1 year ago
In fact, it was Supertone Shift's installer that prodded me to seek it out (I happened to find and install Shift a couple of weeks ago).
In this case, it needs admin permissions to install to `/Library/Application Support` as well as `/Library/Audio`.
It needs to restart in order for the HAL driver to be loaded (this provides the virtual audio interface for using the app with Teams, Zoom, etc.)
The preinstall/postinstall scripts simply handle the app's directory in Application Support.
I decided it was safe enough, and had some fun playing with it. It contacts what it claims are licensing servers (when it starts), and won't start without it. It wanted to keep contacting those servers constantly, but blocking its network access via Little Snitch didn't prevent it from functioning. The network traffic was in the single-digit kilobyte range, so I felt reasonably confident no audio data was being looted.
[0] https://mothersruin.com/software/SuspiciousPackage/
moralestapia|1 year ago
michaelmior|1 year ago
themoonisachees|1 year ago
corytheboyd|1 year ago
sdfgtr|1 year ago
itishappy|1 year ago
You: Why? I already own a 2002 Ford Escape...
I'm not trying to make fun of you, I think you actually have a unique and impressive perspective! I've always hated hearing my voice on answering machines, so if I could choose any voice I'd choose Chris Cornell or Morgan Freeman.
idiotsecant|1 year ago
trashcluster|1 year ago
hollowayaegis|1 year ago
[1] https://audio.sunflower.industries
jl6|1 year ago
I know watermarks are never foolproof, but they may deter casual misuse.
terhechte|1 year ago
catapart|1 year ago
rcarmo|1 year ago
edwcross|1 year ago
So, from what I understand, I cannot use it and then upload the training video to Youtube. Or can I?
bogwog|1 year ago
jzemeocala|1 year ago
Would you be interested in any help porting/maintaining a Linux release?
tiborsaas|1 year ago
andoando|1 year ago
Any idea how this is possible? Voicemod does something similar and I couldn't figure it out. Is it actually AI or is this just shifting pitch/reverb/etc
drivingmenuts|1 year ago
cma|1 year ago
jen729w|1 year ago
Seriously?
giankam|1 year ago
jen729w|1 year ago
Just improve the installer so I don’t feel like I’ve been scammed by malware!
earthnail|1 year ago
simse|1 year ago
itronitron|1 year ago
camillomiller|1 year ago
dannyw|1 year ago
jack_pp|1 year ago
phil-martin|1 year ago
- overcoming social anxiety in voice or online calls. It doesn’t take very many bullying incidents during childhood to become convinced you have a horrible or weird voice. I can see this being used as a useful tool to make people feel more comfortable by having a different voice
- amateur interactive fiction development. Having your characters have a real voice in a game in response too the players commands is a real need, and being able to record it yourself and be a different character would be a huge enabler of creating something for a solo developer.
- internal HR videos/podcasts. Creating these can be very expensive, needing different persons reading out dialogue could significantly reduce the effort in recording and producing these
- another instrument for music creators. Auto tune is a very common tool for music production for all skill levels, and this could be applied in a very similar way
It no doubt can be used for disingenuous purposes, any technology can. But these can be real life improving tools enabling many people to do things they never thought possible.
The idea of participating in Q&A session in a webinar would be far too confronting and inconceivable for many people, but to be able to do it semi-anonymously with a different voice would eliminate much of the anxiety preventing them
unknown|1 year ago
[deleted]
kthartic|1 year ago
Text-to-speech AI voice generators exist, but you don't have fine control over the emotion/expressiveness/intonation of the lines like you do with this approach.
slipheen|1 year ago
nounaut|1 year ago
rtcode_io|1 year ago
gardenhedge|1 year ago
darkoob12|1 year ago
WORMS_EAT_WORMS|1 year ago
unknown|1 year ago
[deleted]
vouaobrasil|1 year ago
I think it's rather sad. Yes, there are some fringe use-cases perhaps but I think this is the wrong direction for humanity. We should find more value in what we already have rather than inventing arbitrary things like this to hide away from real acceptance of ourselves.
latexr|1 year ago
Ironically, this will lead to a work where we need to use these fake personas online to not have our lives messed with offline.
I don’t fully agree with your first paragraph, but I do agree with the second one.
unknown|1 year ago
[deleted]
unknown|1 year ago
[deleted]
kunley|1 year ago
[deleted]
jen729w|1 year ago
hagbard_c|1 year ago
kunley|1 year ago
The fact someone did something and put a substantial effort into it is not a reason good enough to justify said effort (other than the benefits of learrning) and the product that was created. The world is full of things which actually made it worse place.
Another comment is actually a meta-comment and might be shocking to some people here:
downvoting is not a good method of making someone stop saying statements perceived by some as unconfortable. In fact, there is nothing wrong with earning points first and then burning them with saying comments that are feared by certain individuals to the level that they "must" be downvoted...