(no title)
Metricon | 11 months ago
To run llama.cpp server: llama-server -m C:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock
Metricon | 11 months ago
To run llama.cpp server: llama-server -m C:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock
unknown|11 months ago
[deleted]
Zetaphor|11 months ago
Imustaskforhelp|11 months ago
thot_experiment|11 months ago
hexaga|11 months ago
Like most other tokens, they have text reprs: '<custom_token_28631>' etc. You sample 7 of them (1 frame), parse out the ids, pass through snac decoder, and you now have a frame of audio from a 'text' pipeline.
The neat thing about this design is you can throw the model into any existing text-text pipeline and it just works.
gianpaj|11 months ago
You can run `python gguf_orpheus.py --text "Hello, this is a test" --voice tara` and connect to the llama-server
See https://github.com/isaiahbjork/orpheus-tts-local
See my GH issue example output https://github.com/isaiahbjork/orpheus-tts-local/issues/15