Nice work on the pure Go implementation. I built a similar pipeline for generating audio versions of articles and the pricing trade-offs are tough at scale. ElevenLabs is obviously the quality winner but their per-character pricing eats up all the margin if you're doing anything high volume. I've found Deepgram to be the most pragmatic choice lately since OpenAI's prosody can be a bit flat on longer texts.
storystarling|1 month ago
schappim|1 month ago
ElevenLabs is awesome for speech generation (nothing beats it), but their speech to text is terrible especially for voice activity detection.