(no title)
d4rkp4ttern | 13 days ago
Sharing my setup in case it may be useful for others; it's especially useful when working with CLI agents like Code Code or Codex-CLI:
STT: Hex [1] (open-source), with Parakeet V3 - stunningly fast, near-instant transcription. The slight accuracy drop relative to bigger models is immaterial when you're talking to an AI. I always ask it to restate back to me what it understood, and it gives back a nicely structured version -- this helps confirm understanding as well as likely helps the CLI agent stay on track. It is a MacOS native app and leverages the CoreML/Neural Engine to get extremely fast transcription (I used to recommend a similar app Handy but it has frequent stuttering issues, and Hex is actually even faster, which I didn't think was possible!)
TTS: Kyutai's Pocket-TTS [2], just 100M params, and amazing speech quality (English only). I made a voice plugin [3] based on this, for Claude Code so it can speak out short updates whenever CC stops. It uses a combination of hooks that nudge the main agent to append a speakable summary, falling back to using a headless agent in case the main agent forgets. Turns out to be surprisingly useful. It's also fun as you can customize the speaking style and mirror your vibe and "colorful language" etc.
The voice plugin gives commands to control it:
/voice:speak stop
/voice:speak azelma (change the voice)
/voice:speak prompt <your arbitrary prompt to control the style>
[1] Hex https://github.com/kitlangton/Hex[2] Pocket-TTS https://github.com/kyutai-labs/pocket-tts
[3] Voice plugin for Claude Code: https://pchalasani.github.io/claude-code-tools/plugins-detai...
andhuman|13 days ago
freedomben|13 days ago
d4rkp4ttern|13 days ago
coppsilgold|12 days ago
I had cause to do the the opposite: Hotkey -> clipboard TTS
aidenn0|13 days ago
d4rkp4ttern|13 days ago