top | item 44254723

(no title)

travisvn | 8 months ago

Chatterbox is fantastic.

I created an API wrapper that also makes installation easier (Dockerized as well) https://github.com/travisvn/chatterbox-tts-api/

Best voice cloning option available locally by far, in my experience.

discuss

mistersquid|8 months ago

> Chatterbox is fantastic.

> I created an API wrapper that also makes installation easier (Dockerized as well) https://github.com/travisvn/chatterbox-tts-ap

Gave your wrapper a try and, wow, I'm blown away by both Chatterbox TTS and your API wrapper.

Excuse the rudimentary level of what follows.

Was looking for a quick and dirty CLI incantation to specify a local text file instead of the inline `input` object, but couldn't figure it.

Pointers much appreciated.

travisvn|8 months ago

This API wrapper was initially made to support a particular use case where someone's running, say, Open WebUI or AnythingLLM or some other local LLM frontend.

A lot of these frontends have an option for using OpenAI's TTS API, and some of them allow you to specify the URL for that endpoint, allowing for "drop-in replacements" like this project.

So the speech generation endpoint in the API is designed to fill that niche. However, its usage is pretty basic and there are curl statements in the README for testing your setup.

Anyway, to get to your actual question, let me see if I can whip something up. I'll edit this comment with the command if I can swing it.

In the meantime, can I assume your local text files are actual `.txt` files?

nitroedge|8 months ago

Spent an hour trying to get it running with a RTX 50 series, no luck, tried with PyTorch 2.7.

Seems built for 2.6.

"chatterbox-tts 0.1.2 requires torch==2.6.0, but you have torch 2.7.0+cu128 which is incompatible. chatterbox-tts 0.1.2 requires torchaudio==2.6.0, but you have torchaudio 2.7.0+cu128 which is incompatible."

venusenvy47|8 months ago

Would this be usable on a PC without a GPU?

travisvn|8 months ago

It can definitely run on CPU — but I'm not sure if it can run on a machine without a GPU entirely.

To be honest, it uses a decently large amount of resources. If you had a GPU, you could expect about 4-5 gb memory usage. And given the optimizations for tensors on GPUs, I'm not sure how well things would work "CPU only".

If you try it, let me know. There are some "CPU" Docker builds in the repo you could look at for guidance.

If you want free TTS without using local resources, you could try edge-tts https://github.com/travisvn/openai-edge-tts