(no title)
yberreby | 25 days ago
I often have ideas while cleaning around, cooking, etc. Claude Code (with Opus 4.5) is very capable. I've long wanted to get Claude Code working hands-free.
So I took an afternoon and rolled my own STT-TTS voice stack for Claude Code. The voice stack runs locally on my M4 Pro and is extremely fast.
For Speech to Text, Parakeet v3 TDT: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
For Text to Speech, Pocket TTS: https://github.com/kyutai-labs/pocket-tts
Custom MCP to hook this into Claude Code, with a little bit of hacking around to get my AirPods' stem click to be captured.
I'm having Claude narrate its thought process and everything it's doing in short, frequent messages, and I can interrupt it at any time with a stem click, which starts listening to me and sends the message once a sufficiently long pause is detected.
I stream the Claude Code session via AirPlay to my living room TV, so that I don't have to get close to the laptop if I need extra details about what it's doing.
Yesterday, I had it debug a custom WhatsApp integration (via [1]) hands-free while brushing my teeth. It can use `osascript` for OS integration, browse the web via Claude Code's builtin tools...
My back is thankful. This is really fun.
gdhkgdhkvff|24 days ago
On the other hand, it makes me wonder if we’re just heading for a future where everyone is just always working, at all times, even while doing other things.
“Wow look at our daughter taking her first steps! She’s doing so… wait hold on… No, Claude. I said to name the class “potatoes”, not “‘pot’ followed by eight ‘O’s,” you dumb robot!”
mpolichette|24 days ago
Rather than the example of missing first steps, what if we had, "Ok Claude, prepare a few slides for my presentation, I'm going to watch my childs mid-day recital..." maybe you get a success/failure ping and maybe even need to step out for part of the event, but in another world you couldn't have gone at all.
volkk|24 days ago
unknown|24 days ago
[deleted]
yberreby|24 days ago
For context, I'm a PhD student. Work-life balance is already... elusive.
andai|24 days ago
vmbm|24 days ago
mrbeep|24 days ago
https://github.com/dgrr/tgcli
It does sync, messages, send, search (FTS5), stickers, forum topics, and has socket IPC for automation. Install via cargo or homebrew. Seems like it'd fit nicely into what you're building.
yberreby|24 days ago
azisk1|21 days ago
yberreby|19 days ago
yberreby|19 days ago
Jommi|24 days ago
yberreby|24 days ago
The main non-trivial parts are proper state machine / concurrency management, and AirPods interaction; in particular, detecting a stem click while the microphone is active. I worked around this by having the mic-off-to-mic-on transition use a media player Play event, and mic-on-to-mic-off do silence detection. It's super hacky but actually works surprisingly well.
Currently looking into using `AVAudioApplication.setInputMuteStateChangeHandler(_:)`, like AirMute [1] does, so that I don't have to rely on silence detection and can manually terminate the voice command with a second click.
If you want to roll your own version of what I described today, it should be pretty easy to do so based on what I wrote if you have a Max x5-x20 plan and feed it to Opus. Bonus points, you get to customize it to your exact needs.
[1]: https://github.com/Solarphlare/AirMute