top | item 46899573

(no title)

yberreby | 25 days ago

Watching the OpenClaw/Molbot craze has been entertaining. I wouldn't use it - too much code, changing too quickly, with too little regard for security - but it has inspired me.

I often have ideas while cleaning around, cooking, etc. Claude Code (with Opus 4.5) is very capable. I've long wanted to get Claude Code working hands-free.

So I took an afternoon and rolled my own STT-TTS voice stack for Claude Code. The voice stack runs locally on my M4 Pro and is extremely fast.

For Speech to Text, Parakeet v3 TDT: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3

For Text to Speech, Pocket TTS: https://github.com/kyutai-labs/pocket-tts

Custom MCP to hook this into Claude Code, with a little bit of hacking around to get my AirPods' stem click to be captured.

I'm having Claude narrate its thought process and everything it's doing in short, frequent messages, and I can interrupt it at any time with a stem click, which starts listening to me and sends the message once a sufficiently long pause is detected.

I stream the Claude Code session via AirPlay to my living room TV, so that I don't have to get close to the laptop if I need extra details about what it's doing.

Yesterday, I had it debug a custom WhatsApp integration (via [1]) hands-free while brushing my teeth. It can use `osascript` for OS integration, browse the web via Claude Code's builtin tools...

My back is thankful. This is really fun.

[1]: https://github.com/jlucaso1/whatsapp-rust

discuss

order

gdhkgdhkvff|24 days ago

On one hand, I think this project is super cool and something I would use and/or would have loved to build myself for my own use.

On the other hand, it makes me wonder if we’re just heading for a future where everyone is just always working, at all times, even while doing other things.

“Wow look at our daughter taking her first steps! She’s doing so… wait hold on… No, Claude. I said to name the class “potatoes”, not “‘pot’ followed by eight ‘O’s,” you dumb robot!”

mpolichette|24 days ago

I don't disagree, but I think there is the otherside of that same coin... What if we could do other stuff while remaining productive.

Rather than the example of missing first steps, what if we had, "Ok Claude, prepare a few slides for my presentation, I'm going to watch my childs mid-day recital..." maybe you get a success/failure ping and maybe even need to step out for part of the event, but in another world you couldn't have gone at all.

volkk|24 days ago

we kind of already are with our phones and Slack, the difference at this point is negligible. i personally won't have airpods in 24/7 with my kid (or ever) so if i were doing something like this, it would be through my phone, which is already something i use fairly often. not too much difference there IMO (at least anecdotally speaking)

yberreby|24 days ago

That's a fair point, and I had the exact same thought while building this. I had previously resisted the urge of integrating Claude Code with e.g. ntfy.sh for this reason. But in practice, this works for me. I end up being less likely to spend time on the computer and more likely to be doing something on my feet.

For context, I'm a PhD student. Work-life balance is already... elusive.

andai|24 days ago

Working? That's the AI's job!

vmbm|24 days ago

I got into a bike accident yesterday and injured both of my arms. Fortunately the damage wasn't too severe, but it was bad enough that using a computer is rather difficult. So now I'm spending some of my idle time playing around with different options for voice control. Like you I am a little wary of OpenClaw so I might try something similar to your setup as an alternative. So far I have gotten to the point where I can use voice dictation in notepad to write comments and commands, but copying and pasting the text is enough of a struggle (compounded by the fact that my cat is competing with me for the keyboard and I am in no state to fend her off) that I am aiming to push things a bit further. Sucks being injured but having a nice distraction to keep my mind occupied has so far been a great way to pass the time.

mrbeep|24 days ago

Nice setup! If you ever want to add Telegram alongside WhatsApp, check out tgcli - it's a pure Rust Telegram CLI (no TDLib, no C/C++ deps).

https://github.com/dgrr/tgcli

It does sync, messages, send, search (FTS5), stickers, forum topics, and has socket IPC for automation. Install via cargo or homebrew. Seems like it'd fit nicely into what you're building.

yberreby|24 days ago

Looks like a nice library, thanks for sharing! I know Telegram bots are very popular and that the API story is quite nice, but I have tended to avoid Telegram. My preference would be to go through Signal. I just started looking into my options on this yesterday. Any particular reason why you chose Telegram?

azisk1|21 days ago

I would be happy to create such a system myself though I don't have Airpods. I have Beats Flex and perhaps I could somehow control the input with double volume up and down clicks. Or just use my phone somehow. Would you be willing to write more about your setup, perhaps a blog post?

yberreby|19 days ago

Since this has garnered some interest, I definitely will sit down and write a blog post when I have a little bit of time. I have upgraded the setup since that post a few days ago, and keep doing so continuously; it's always running in the background while I work. There are some rough edges, but the workflow feels like what Siri should have been.

Jommi|24 days ago

repo?

yberreby|24 days ago

I'm in the process of migrating from my first POC's disgusting mess of vibe-coded Python to a cleaner (and shareable) Rust architecture. It's going well but I will wait for it to stabilize a bit before sharing.

The main non-trivial parts are proper state machine / concurrency management, and AirPods interaction; in particular, detecting a stem click while the microphone is active. I worked around this by having the mic-off-to-mic-on transition use a media player Play event, and mic-on-to-mic-off do silence detection. It's super hacky but actually works surprisingly well.

Currently looking into using `AVAudioApplication.setInputMuteStateChangeHandler(_:)`, like AirMute [1] does, so that I don't have to rely on silence detection and can manually terminate the voice command with a second click.

If you want to roll your own version of what I described today, it should be pretty easy to do so based on what I wrote if you have a Max x5-x20 plan and feed it to Opus. Bonus points, you get to customize it to your exact needs.

[1]: https://github.com/Solarphlare/AirMute