top | item 44807476

(no title)

The headline feature isn’t the 25 MB footprint alone. It’s that KittenTTS is Apache-2.0. That combo means you can embed a fully offline voice in Pi Zero-class hardware or even battery-powered toys without worrying about GPUs, cloud calls, or restrictive licenses. In one stroke it turns voice everywhere from a hardware/licensing problem into a packaging problem. Quality tweaks can come later; unlocking that deployment tier is the real game-changer.

discuss

rohan_joshi|6 months ago

yeah, we are super excited to build tiny ai models that are super high quality. local voice interfaces are inevitable and we want to power those in the future. btw, this model is just a preview, and the full release next week will be of much higher quality, along w another ~80M model ;)

woadwarrior01|6 months ago

> It’s that KittenTTS is Apache-2.0

Have you seen the code[1] in the repo? It uses phonemizer[2] which is GPL-3.0 licensed. In its current state, it's effectively GPL licensed.

[1]: https://github.com/KittenML/KittenTTS/blob/main/kittentts/on...

[2]: https://github.com/bootphon/phonemizer

Edit: It looks like I replied to an LLM generated comment.

oezi|6 months ago

The issue is even bigger: phonemizer is using espeak-ng, which isn't very good at turning graphemes into phonemes. In other TTS which rely on phonemes (e.g. Zonos) it turned out to be one of the key issues which cause bad generations.

And it isn't something you can fix, because the model was trained on bad phonemes (everyone uses Whisper + then phonemizes the text transcript).

jacereda|6 months ago

https://github.com/KittenML/KittenTTS/issues/17

gorgoiler|6 months ago

This would only apply if they were distributing the GPL licensed code alongside their own code.

If my MIT-licensed one-line Python library has this line of code…

  run([“bash”, “-c”, “echo hello”])

…I’m not suddenly subject to bash’s licensing. For anyone wanting to run my stuff though, they’re going to need to make sure they themselves have bash installed.

(But, to argue against my own point, if an OS vendor ships my library alongside a copy of bash, do they have to now relicense my library as GPL?)

Hackbraten|6 months ago

Given that the FSF considers Apache-2.0 to be compatible with GPL-3.0 [0], how could the fact that phonemizer is GPL-3.0 possibly be an issue?

[0]: https://www.gnu.org/licenses/license-list.html#apache2

keyKeeper|6 months ago

Okay, what's stopping you from feeding the code into an LLM and re-write it and make it yours? You can even add extra steps like make it analyze the code block by block then supervise it as it is rewriting it. Bam. AI age IP freedom.

Morals may stop you but other than that? IMHO all open source code is public domain code if anyone is willing to spend some AI tokens.

defanor|6 months ago

A Festival's English model, festvox-kallpc16k, is about 6 MB, and it is a large model; festvox-kallpc8k is about 3.5 MB.

eSpeak NG's data files take about 12 MB (multi-lingual).

I guess this one may generate more natural-sounding speech, but older or lower-end computers were capable of decent speech synthesis previously as well.

Joel_Mckay|6 months ago

Custom voices could be added, but the speed was more important to some users.

$ ls -lh /usr/bin/flite

Listed as 27K last I checked.

I recall some Blind users were able to decode Gordon 8-bit dialogue at speeds most people found incomprehensible. =3

pjc50|6 months ago

> KittenTTS is Apache-2.0

What about the training data? Is everyone 100% confident that models are not a derived work of the training inputs now, even if they can reproduce input exactly?

entropie|6 months ago

I play around with a nvidia jetson orin nano super right now and its actually pretty usuable with gemma3:4b and quite fast - even image processing is done in like 10-20 seconds but this is with GPU support. When something is not working and ollama is not using the GPU this calls take ages because the cpu is just bad.

Iam curious how fast this is with CPU only.

phh|6 months ago

It depends on espeak-ng which is GPLv3

ethan_smith|6 months ago

This opens up voice interfaces for medical devices, offline language learning tools, and accessibility gadgets for the visually impaired - all markets where cloud dependency and proprietary licenses were showstoppers.

Narishma|6 months ago

But Pi Zero has a GPU, so why not make use of it?

a96|6 months ago

Because then you're stuck on that device only.

CyberDildonics|6 months ago

The github just has a few KB of python that looks like an install script. How is this used from C++ ?

unknown|6 months ago

[deleted]

unknown|6 months ago

[deleted]