I wouldn't use Edge TTS for commercial projects since it's using an internal Microsoft API that was reverse engineered.
If you are looking for a commercial API, I just launched a TTS API powered by the the best performing open source model Kokoro: https://www.lemonfox.ai/text-to-speech-api. The API is compatible with OpenAI and ElevenLabs and up to 25x cheaper.
It's worth noting that there have been occasions where the library was blocked and it took a few weeks to workaround said block. For example, when a valid Sec-MS-Token became required, it took a while to implement it in the library: https://github.com/rany2/edge-tts/blob/08b10b931db3f788a506c...
Basically, it's a very bad idea to use this library for anything serious/mission critical. It also is really limited to only taking in text (i.e., no custom SSML, emotion elements, etc) as Microsoft restricts the API to only the features Microsoft Edge itself already supports. Generally commercial users would want these more advanced features and so they'd want to use Azure Cognitive Services.
At any rate this library was never really marketed, I'm not sure how it blew up. It was really only intended so that I can have audio files I can play back for my Home Assistant instance. Later, I started using it to generate e-books. In general, these are the two main uses of the library AFAIK.
Nice I was thinking about launching an API because providers like Replicate have long queues. I think if you can nail down latency and concurrency you may get a lot of users who need reliable fast TTS.
Ah, I'm always looking for new ones, but it doesn't look like it supports SSML. Most engines have trouble with things like postal codes, names, and other implicit linguistic rules. Take the example
> Melania Trump's zip code is 20001.
It says "Melaynia Trump's zip code is twenty-thousand one". With SSML, you can tell the engine the correct pronunciation and to say a string of numbers digit-by-digit. Spelling proper nouns differently to trick it into pronouncing it correctly works until it doesn't.
Being able to tell it to pronounce "Melania" like [ˌməˈlɑːn.jə] or [%m@"lA:n.j@] and tweak other aspects of the synthesis with SSML is, in my opinion, an important part of a commercial speech synthesis offering.
I wonder how much effort is needed to make these engines work with SSML. Kokoro+SSML would be awesome.
Hey BrunoJo, I'd like to learn more about lemonfox.ai, but there does not seem to be information such as "about us" links. Your service looks worth investigating.
Interesting, I'm interested in something like this, but the page doesn't have much information.
- What languages are supported?
- How many voices are available?
- Is it possible to use without a monthly subscription? I'd rather pay only based on my usage (I don't use it every month).
For my use case, I'd need access to a wide variety of languages, and ideally 5+ voices per language. I'm currently using Amazon Polly, but I wonder if there's something better now.
Why would you pirate a TTS service when there are so many great options for local open source TTS now? Models like Fish and Kokoro and StyleTTSv2 are great and very fast.
Do you know any commercial licensed TTS that support 50+ languages and are relatively small (e.g. many small models, not 1 big model)? Meta's open models supports like 300 languages, but the license doesn't permit commercial use :-/
Its not running on the edge. A hack to use MS online tts.
>>
edge-tts is a Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
Have been using this for some time. It is pretty good. But not as good as ElevenLabs though.
Also, ironically enough, ElevenLabs lunched a readerapp for iOS and Android, which allows you to text to speech for "free" in some limited voice selections, but the app is not available for PC or as browser extension. So like "we give you unlimited tts but only if you use your smartphone"
I like to use Edge on occasion when I need to read something dry but necessary because I find following along with the TTS it’s auto-highlight of text helps me stay focused and retain better as well.
Is there any equivalent program for ebooks? If not can someone build one? The dream would be to plop in an arbitrary document (pdf, docs, tex, epub, and so on) and have it read to me by a reasonable TTS at a speed of my choosing and have words / lines highlighted as the TTS goes along. Bonus points if you can regularly identify and skip things that are not necessarily relevant like page numbers, headers, footnote markers, and so on, which is something that Edge TTS within Edge struggles with when reading PDFs.
I've been using https://readest.com/ lately. It's FOSS and just recently got this feature. The TTS voices are pretty natural and text is highlighted one sentence at a time. Plus the design of the product is great.
Can anyone just make a simple program that will use one of these better TTS engines. I just want a a dialog box, a big button that says "Generate text" and you paste in the content you want converted to receive an MP3 file. Fully compiled binaries for Linux, Windows, and Mac, please?
BrunoJo|1 year ago
If you are looking for a commercial API, I just launched a TTS API powered by the the best performing open source model Kokoro: https://www.lemonfox.ai/text-to-speech-api. The API is compatible with OpenAI and ElevenLabs and up to 25x cheaper.
rany_|1 year ago
Basically, it's a very bad idea to use this library for anything serious/mission critical. It also is really limited to only taking in text (i.e., no custom SSML, emotion elements, etc) as Microsoft restricts the API to only the features Microsoft Edge itself already supports. Generally commercial users would want these more advanced features and so they'd want to use Azure Cognitive Services.
At any rate this library was never really marketed, I'm not sure how it blew up. It was really only intended so that I can have audio files I can play back for my Home Assistant instance. Later, I started using it to generate e-books. In general, these are the two main uses of the library AFAIK.
qqqult|1 year ago
ipsum2|1 year ago
bilater|1 year ago
dqv|1 year ago
> Melania Trump's zip code is 20001.
It says "Melaynia Trump's zip code is twenty-thousand one". With SSML, you can tell the engine the correct pronunciation and to say a string of numbers digit-by-digit. Spelling proper nouns differently to trick it into pronouncing it correctly works until it doesn't.
Being able to tell it to pronounce "Melania" like [ˌməˈlɑːn.jə] or [%m@"lA:n.j@] and tweak other aspects of the synthesis with SSML is, in my opinion, an important part of a commercial speech synthesis offering.
I wonder how much effort is needed to make these engines work with SSML. Kokoro+SSML would be awesome.
bsenftner|1 year ago
laurentlb|1 year ago
For my use case, I'd need access to a wide variety of languages, and ideally 5+ voices per language. I'm currently using Amazon Polly, but I wonder if there's something better now.
hobo_mark|1 year ago
modeless|1 year ago
Click the leaderboard tab here: https://huggingface.co/spaces/TTS-AGI/TTS-Arena
itake|1 year ago
I believe the Edge API supports more models:
https://gist.github.com/BettyJJ/17cbaa1de96235a7f5773b8690a2...
Do you know any commercial licensed TTS that support 50+ languages and are relatively small (e.g. many small models, not 1 big model)? Meta's open models supports like 300 languages, but the license doesn't permit commercial use :-/
userbinator|1 year ago
noja|1 year ago
natebc|1 year ago
homarp|1 year ago
chopete3|1 year ago
>> edge-tts is a Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
nejsjsjsbsb|1 year ago
wiradikusuma|1 year ago
hexage1814|1 year ago
Also, ironically enough, ElevenLabs lunched a readerapp for iOS and Android, which allows you to text to speech for "free" in some limited voice selections, but the app is not available for PC or as browser extension. So like "we give you unlimited tts but only if you use your smartphone"
dcre|1 year ago
slyn|1 year ago
Is there any equivalent program for ebooks? If not can someone build one? The dream would be to plop in an arbitrary document (pdf, docs, tex, epub, and so on) and have it read to me by a reasonable TTS at a speed of my choosing and have words / lines highlighted as the TTS goes along. Bonus points if you can regularly identify and skip things that are not necessarily relevant like page numbers, headers, footnote markers, and so on, which is something that Edge TTS within Edge struggles with when reading PDFs.
FireInsight|1 year ago
visarga|1 year ago
lf-non|1 year ago
gostsamo|1 year ago
jahsome|1 year ago
westcort|1 year ago
slig|1 year ago
erk__|1 year ago
https://www.pdq.com/blog/powershell-text-to-speech-examples/
VMtest|1 year ago
Now that I try it on desktop, it's really good! I might try to use the python script in the future
gigel82|1 year ago
I'm curious, would this be the legal equivalent of "cracked" software in terms of piracy?
rany_|1 year ago
caseyy|1 year ago
userbinator|1 year ago
RobinHirst11|1 year ago
yapyap|1 year ago