What's an example use case for something like this? "At the edge" makes me think offline but are you generating audio at anything faster than real time in that case?
Would be curious to see an even lower cost/lower power option. Seems this one is $120-170.
This is for speech to text, so generating text, not audio. And on a $120-$170 device, this transcribes at 30x real time. The code does run on lower end Rockchip processors, costing ~$30, although only at 10x real time speed.
Sorry for the confusing phrasing about STT vs TTS. I'm not familiar with cases where you would use something like this 'at the edge' instead of say a laptop. I was thinking maybe some sort of offline setup with a microphone -- but in that case the audio is just real-time. Do you have some use cases in mind?
1/4 of the price for 1/3 of the speed is a good deal! Presumably still faster than faster-whisper on the same hardware?
keveman|2 years ago
smpanaro|2 years ago
1/4 of the price for 1/3 of the speed is a good deal! Presumably still faster than faster-whisper on the same hardware?