top | item 41980143

(no title)

v7n | 1 year ago

It's not exactly what OP wants out-of-the-box, but if anyone is considering building one I suggest taking a look at this.¹ It is really easy to tinker with, can run both on devide or in a client-server model. It has the required speech-to-text and text-to-speech endpoints, with multiple options for each built-in. If you can make the LLM AI assistant part of the pipeline to perform translation to a degree you're comfortable with, this could be a solution.

¹ https://github.com/huggingface/speech-to-speech

discuss

order

dmezzetti|1 year ago

A similar option exists with txtai (https://github.com/neuml/txtai).

https://neuml.hashnode.dev/speech-to-speech-rag

https://www.youtube.com/watch?v=tH8QWwkVMKA

One would just need to remove the RAG piece and use a Translation pipeline (https://neuml.github.io/txtai/pipeline/text/translation/). They'd also need to use a Korean TTS model.

Both this and the Hugging Face speech-to-speech projects are Python though.

authorfly|1 year ago

Your library is quite possibly the best example of effortful, understandable and useful work I have ever seen - principally evidenced by how you keep evolving with the times. I've seen you keep it up to date and even on the edge now for years and through multiple NLP mini-revolutions (sentence embeddings/new uses) and what must have been the annoying release of LLMs and still push on to have an explainable and useful library.

Code from txtai just feels like exactly the right way to express what I am usually trying to do in NLP.

My highest commendations. If you ever have time, please share your experience/what lead to you taking this path with txtai. For example I see you started in earnest around August 2020 (maybe before) - at that time i would love to know if you imagined LLMs coming on to be as prominent as they are now and for instruction-tuning to work as well as it is. I know at that time many PhD students I knew in NLP (and profs) felt LLMs were far too unreliable and would not reach e.g. consistent scores on MMLU/HELLASWAG.