The headline feature isn’t the 25 MB footprint alone. It’s that KittenTTS is Apache-2.0. That combo means you can embed a fully offline voice in Pi Zero-class hardware or even battery-powered toys without worrying about GPUs, cloud calls, or restrictive licenses. In one stroke it turns voice everywhere from a hardware/licensing problem into a packaging problem. Quality tweaks can come later; unlocking that deployment tier is the real game-changer.
rohan_joshi|6 months ago
woadwarrior01|6 months ago
Have you seen the code[1] in the repo? It uses phonemizer[2] which is GPL-3.0 licensed. In its current state, it's effectively GPL licensed.
[1]: https://github.com/KittenML/KittenTTS/blob/main/kittentts/on...
[2]: https://github.com/bootphon/phonemizer
Edit: It looks like I replied to an LLM generated comment.
oezi|6 months ago
And it isn't something you can fix, because the model was trained on bad phonemes (everyone uses Whisper + then phonemizes the text transcript).
jacereda|6 months ago
gorgoiler|6 months ago
If my MIT-licensed one-line Python library has this line of code…
…I’m not suddenly subject to bash’s licensing. For anyone wanting to run my stuff though, they’re going to need to make sure they themselves have bash installed.(But, to argue against my own point, if an OS vendor ships my library alongside a copy of bash, do they have to now relicense my library as GPL?)
Hackbraten|6 months ago
[0]: https://www.gnu.org/licenses/license-list.html#apache2
keyKeeper|6 months ago
Morals may stop you but other than that? IMHO all open source code is public domain code if anyone is willing to spend some AI tokens.
defanor|6 months ago
eSpeak NG's data files take about 12 MB (multi-lingual).
I guess this one may generate more natural-sounding speech, but older or lower-end computers were capable of decent speech synthesis previously as well.
Joel_Mckay|6 months ago
$ ls -lh /usr/bin/flite
Listed as 27K last I checked.
I recall some Blind users were able to decode Gordon 8-bit dialogue at speeds most people found incomprehensible. =3
pjc50|6 months ago
What about the training data? Is everyone 100% confident that models are not a derived work of the training inputs now, even if they can reproduce input exactly?
entropie|6 months ago
Iam curious how fast this is with CPU only.
phh|6 months ago
ethan_smith|6 months ago
Narishma|6 months ago
a96|6 months ago
CyberDildonics|6 months ago
unknown|6 months ago
[deleted]
unknown|6 months ago
[deleted]