top | item 46912987

I trained a 135M TTS model for ~$100, runs 20× real-time on CPU

4 points| sammyyyyyyy | 23 days ago |huggingface.co

2 comments

order

sammyyyyyyy|23 days ago

Today, I'm releasing a new version of my side project: SoproTTS

A 135M parameter TTS model trained for ~$100 on 1 GPU, running ~20× real-time on a base MacBook M3 CPU.

v1.5 highlights (on CPU):

• 250 ms TTFA streaming latency

• 0.05 RTF (~20× real-time)

• Zero-shot voice cloning

• Smaller, faster, more stable

Still not perfect (OOD voices can be tricky, and there are still some artifacts), but a decent upgrade.

Repo: https://github.com/samuel-vitorino/sopro