top | item 43419983

(no title)

Metricon | 11 months ago

GGUF version created by "isaiahbjork" which is compatible with LM Studio and llama.cpp server at: https://github.com/isaiahbjork/orpheus-tts-local/

To run llama.cpp server: llama-server -m C:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock

discuss

unknown|11 months ago

[deleted]

Zetaphor|11 months ago

I've been testing this out, it's quite good and especially fast. Crazy that this is working so well at Q4

Imustaskforhelp|11 months ago

Can somebody please create a gradio client for this as well. I really want to try this out but the complexity messes me up.

thot_experiment|11 months ago

Wait, how do you get audio out of llama-server?

hexaga|11 months ago

Orpheus is a llama model trained to understand/emit audio tokens (from snac). Those tokens are just added to its tokenizer as extra tokens.

Like most other tokens, they have text reprs: '<custom_token_28631>' etc. You sample 7 of them (1 frame), parse out the ids, pass through snac decoder, and you now have a frame of audio from a 'text' pipeline.

The neat thing about this design is you can throw the model into any existing text-text pipeline and it just works.

gianpaj|11 months ago

You need to decode the tokens into audio. See `convert_to_audio` method in `decoder.py`

You can run `python gguf_orpheus.py --text "Hello, this is a test" --voice tara` and connect to the llama-server

See https://github.com/isaiahbjork/orpheus-tts-local

See my GH issue example output https://github.com/isaiahbjork/orpheus-tts-local/issues/15