(no title)
lxe
|
13 days ago
I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.
wolvoleo|13 days ago
Too bad that tool no longer seems to be developed. Looking for something similar. But it's really nice to see what's possible with local models.
Wowfunhappy|13 days ago
By "any GPU" you mean a physical, dedicated GPU card, right?
That's not a small requirement, especially on Macs.
arach|13 days ago
0x457|8 days ago
I run 120M Parakeet model formt STT thing. Even that tiny model works much better than macos dictation these days.
grosswait|13 days ago
wazoox|13 days ago
h3lp|12 days ago
The downside is that couldn't get it to segment for different speakers. The concensus seemed to be to use a separate tool.
BatteryMountain|12 days ago
lxe|10 days ago
(and it did it perfectly without any edits required for me at all.)
nitroedge|12 days ago