(no title)
blutoot | 1 month ago
I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.
Perhaps this is a bit of combining TTS with computer-use.
mritchie712|1 month ago
I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.
When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.
I can clean it up and push to github if anyone would get use out of it.
mritchie712|1 month ago
heliostatic|1 month ago
wanderingmind|1 month ago
sipjca|1 month ago
I initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases
eddyg|1 month ago
There’s also more recent-ish research, like https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130
hasperdi|1 month ago
That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.
sipjca|1 month ago
ryanshrott|1 month ago
[deleted]