It's not exactly what OP wants out-of-the-box, but if anyone is considering building one I suggest taking a look at this.¹ It is really easy to tinker with, can run both on devide or in a client-server model.
It has the required speech-to-text and text-to-speech endpoints, with multiple options for each built-in. If you can make the LLM AI assistant part of the pipeline to perform translation to a degree you're comfortable with, this could be a solution.¹ https://github.com/huggingface/speech-to-speech
dmezzetti|1 year ago
https://neuml.hashnode.dev/speech-to-speech-rag
https://www.youtube.com/watch?v=tH8QWwkVMKA
One would just need to remove the RAG piece and use a Translation pipeline (https://neuml.github.io/txtai/pipeline/text/translation/). They'd also need to use a Korean TTS model.
Both this and the Hugging Face speech-to-speech projects are Python though.
authorfly|1 year ago
Code from txtai just feels like exactly the right way to express what I am usually trying to do in NLP.
My highest commendations. If you ever have time, please share your experience/what lead to you taking this path with txtai. For example I see you started in earnest around August 2020 (maybe before) - at that time i would love to know if you imagined LLMs coming on to be as prominent as they are now and for instruction-tuning to work as well as it is. I know at that time many PhD students I knew in NLP (and profs) felt LLMs were far too unreliable and would not reach e.g. consistent scores on MMLU/HELLASWAG.