top | item 47082803

(no title)

New Kitten TTS V0.8 models are out in three variants - 80M, 40M, 14M. The largest model has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. All models are highly expressive and realistic with high quality voices. Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications, built by KittenML (with < 3) . This release supports English text-to-speech applications in eight voices: four male and four female. Most models are quantized to int8 + fp16, and it uses onnx for runtime. The model is designed to run literally anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release bridges the gap between on-device and cloud models for tts applications. Multi-lingual support is planned for the future.

We'd love your feedback! On-device AI is currently bottlenecked by the availability of tiny performant models. We're trying to change that by releasing open-source models that can unlock on-device voice agents and applications in the next few months.

Code, weights and more information available on our github: https://github.com/KittenML/KittenTTS

discuss

999900000999|10 days ago

Some actual audio examples would be nice. I'd like to see what this is before taking the time to run it

rohan_joshi|10 days ago

we also launched on reddit and got great feedback on locallamma. the video with samples are posted there too.

rohan_joshi|10 days ago

hi, the readme in the github has a video. the entire audio is outputted from the models ^^

would love the feedback.