top | item 41963512

(no title)

v7n | 1 year ago

I gave this a shot using speech-to-speech¹ modified so that it skips the LLM/AI assistant part and just repeats back what it thinks I said and displays the text.

For longer sentences my perception is that Moonshine performs at 80-90% of what Whisper² could do, while using considerably less resources. When trying shorter, two-word utterances it nosedived for some reason.

These numbers don't mean much, but when paired with MeloTTS, Moonshine and Whisper² ate up 1.2 and 2.5 GB of my GPU's memory, respectively.

¹ https://github.com/huggingface/speech-to-speech ² distil-whisper/distil-large-v3

discuss

No comments yet.