top | item 45983574

Show HN: Dia2, open-weights TTS model for realtime speech to speech

3 points| toebee | 3 months ago |github.com

Dia2 is an open-weights, streaming dialogue TTS model. It is capable of generating speech without a full sentence, making it suitable for low-latency speech-to-speech systems. It can generate up to 2 minutes of English audio, and supports audio prefixing.

The inference code and weights (1B / 2B variants) are uploaded to Github and Hugging Face with Apache 2.0 license, to accelerate research. This work was heavily influenced by KyutaiTTS, Mimi, and Sesame. We thank the TPU research cloud for providing computational resources.

2 comments

gac3|3 months ago

Was this trained on the same data as Dia 1?

gac3|3 months ago

Would be interesting to know what improvements come from arch, data, and different tokenizer.