top | item 47043891

(no title)

lxe | 13 days ago

I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.

discuss

wolvoleo|13 days ago

Yeah local works really fine. I tried this other tool: https://github.com/KoljaB/RealtimeVoiceChat which allows you to live chat with a (local) LLM. With local whisper and local LLM (8b llama in my case) it works phenomenally and it responds so quickly that it feels like it's interrupting me.

Too bad that tool no longer seems to be developed. Looking for something similar. But it's really nice to see what's possible with local models.

Wowfunhappy|13 days ago

> The "local is too slow" argument doesn't hold up anymore if you have any GPU at all.

By "any GPU" you mean a physical, dedicated GPU card, right?

That's not a small requirement, especially on Macs.

arach|13 days ago

My M1 16GB Mini and M2 16GB Air both deliver insane local transcription performance without eating up much memory - I think the M line + Parakeet delivers insane local performance and you get privacy for free

0x457|8 days ago

On macs you actually don't need it as long as you have enough RAM.

I run 120M Parakeet model formt STT thing. Even that tiny model works much better than macos dictation these days.

grosswait|13 days ago

No. Give it a try I think you’ll be surprised

wazoox|13 days ago

I've installed murmure on my 2013 Mac, and it works through 1073 words/minute. I don't know about you, but that's plenty faster than me :D

h3lp|12 days ago

FWIW whisper.cpp with the default model works at 6x realtime transcription speed on my four-core ~2.4GHz laptop, and doesn't really stress CPU or memory. This is for batch transcribing podcasts.

The downside is that couldn't get it to segment for different speakers. The concensus seemed to be to use a separate tool.

BatteryMountain|12 days ago

I also built one.. mine is called whispy. I use mine to pump commands to claude. So far a bit hit & miss, still tweaking it.

lxe|10 days ago

Yeah, that's exactly what I started to do with mine. It runs local Whisper on a CUDA, on a graphics card. Whisper is actually better than any other model that I've seen, even things like Parakeet. It can do language detection. It automatically removes all the ahs and all the ohms unless I specifically enter them in my speech. I think this whole paragraph is going to take maybe half a second to process and paste without any issues.

(and it did it perfectly without any edits required for me at all.)

nitroedge|12 days ago

Handy for me has worked wonders