top | item 42801167

(no title)

BrunoJo | 1 year ago

I wouldn't use Edge TTS for commercial projects since it's using an internal Microsoft API that was reverse engineered.

If you are looking for a commercial API, I just launched a TTS API powered by the the best performing open source model Kokoro: https://www.lemonfox.ai/text-to-speech-api. The API is compatible with OpenAI and ElevenLabs and up to 25x cheaper.

discuss

order

rany_|1 year ago

It's worth noting that there have been occasions where the library was blocked and it took a few weeks to workaround said block. For example, when a valid Sec-MS-Token became required, it took a while to implement it in the library: https://github.com/rany2/edge-tts/blob/08b10b931db3f788a506c...

Basically, it's a very bad idea to use this library for anything serious/mission critical. It also is really limited to only taking in text (i.e., no custom SSML, emotion elements, etc) as Microsoft restricts the API to only the features Microsoft Edge itself already supports. Generally commercial users would want these more advanced features and so they'd want to use Azure Cognitive Services.

At any rate this library was never really marketed, I'm not sure how it blew up. It was really only intended so that I can have audio files I can play back for my Home Assistant instance. Later, I started using it to generate e-books. In general, these are the two main uses of the library AFAIK.

ghxst|1 year ago

> no custom SSML

I believe this used to be available for edge tts, very sad to see they removed it.

If anyone knows of comparable projects that implement something like SSML please do share.

bilater|1 year ago

Nice I was thinking about launching an API because providers like Replicate have long queues. I think if you can nail down latency and concurrency you may get a lot of users who need reliable fast TTS.

dqv|1 year ago

Ah, I'm always looking for new ones, but it doesn't look like it supports SSML. Most engines have trouble with things like postal codes, names, and other implicit linguistic rules. Take the example

> Melania Trump's zip code is 20001.

It says "Melaynia Trump's zip code is twenty-thousand one". With SSML, you can tell the engine the correct pronunciation and to say a string of numbers digit-by-digit. Spelling proper nouns differently to trick it into pronouncing it correctly works until it doesn't.

Being able to tell it to pronounce "Melania" like [ˌməˈlɑːn.jə] or [%m@"lA:n.j@] and tweak other aspects of the synthesis with SSML is, in my opinion, an important part of a commercial speech synthesis offering.

I wonder how much effort is needed to make these engines work with SSML. Kokoro+SSML would be awesome.

bsenftner|1 year ago

Hey BrunoJo, I'd like to learn more about lemonfox.ai, but there does not seem to be information such as "about us" links. Your service looks worth investigating.

laurentlb|1 year ago

Interesting, I'm interested in something like this, but the page doesn't have much information. - What languages are supported? - How many voices are available? - Is it possible to use without a monthly subscription? I'd rather pay only based on my usage (I don't use it every month).

For my use case, I'd need access to a wide variety of languages, and ideally 5+ voices per language. I'm currently using Amazon Polly, but I wonder if there's something better now.

hobo_mark|1 year ago

I wish Kokoro supported SSML... Is there a way to explicitly emphasize parts of the text?