This post starts out talking about expecting to spend around $1,000.
There are at least two cross-platform projects where the biggest expense is a microphone instead of software.
1. My project, Talon. Windows/Linux/Mac support, and a first party local speech recognition engine that is pretty good and getting better. It’s free, but the engine is in a private beta (which is $15/mo to support development, optional if there’s a financial issue).
2. Serenade. They are VC backed. Currently free, unsure about their longer term plans. They use cloud based recognition.
I have intermittent RSI and I've been using Talon for 1.5 years for programming (web) with intellij and spacemacs. I reckon I'm as productive as using my hands if I use only voice. When my hands don't hurt and i mix talon with my hands I feel i can do more than I could do with my hands alone. Thanks, lunixbochs. Talon is great.
This comment fails to mention Dragonfly with my Kaldi Active Grammar backend [1], which is cross platform (Windows/Linux now and Mac functional and to be released soon), completely free with no private beta features (although I do accept donations), and 100% open source (unlike Talon). The speech recognition is local, with extremely low latency. See the video demonstration [2] on the project page. I think the underlying Kaldi engine delivers unmatched accuracy as a free non-commercial engine.
I created Kaldi Active Grammar because I didn't trust relying on closed source software for something so crucial to my productivity, where a decision by an outside party determines whether I can function. As a bonus, open source means I can make it work better to fit my needs than closed source ever could.
Furthermore, the original article mentions Caster (which is built on Dragonfly), but doesn't mention that KaldiAG works with it, and that work is underway to expand Caster's platform support.
I have a non-technical friend who says Dragon is too unreliable to be usable. My intuition was that the problem is probably somewhere else and that if Dragon doesn't work nothing will. I'm under the assumption that competing voice recognition software competes on price. Is that assumption wrong?
(For example, I suspect the microphone their school supplies them with may be no good)
On the homepage for Talon it lists macOS under dependencies. I've actually come across the homepage before and didn't look into it further because I thought it was Mac-only.
I had several onsets of RSI a few years back, and had to resort to voice coding as a last resort, after stretches, pauses and ergonomic everything did not do the job. It was pretty awful.
But then, after having seen doctors and neurologists, and finally a physical therapist, I came across my salvation:
- Exercising my hands.
I very rarely see this mentioned for some reason. I exercised regularly, but only the bigger muscle groups, rarely grip strength and wrist strength. It felt counter-intuitive to exercise my already extremely painfully aching hands (when typing), but using grip weights and other methods to work out my hands and wrists, the pain went away quickly! If you are not diagnosed with carpal tunnel, and not already doing this, definitely try it, it saved my career.
Could you please elaborate on the exercises you're finding helpful ? I have a mild RSI myself (outer part of the forearm, near the elbow) and have been trying some eccentric exercises for a few months now but I'm not seeing a big improvement.
Counterpoint: After a promising two weeks, it did nothing for me, back to square one. OTOH a colleague of mine recommended this after having personal success.
It is frustrating that there seems to be only trial and error in all of this.
Just want to point out a very accurate, yet completely inelegant solution to voice text input: Saying each letter and symbol individually.
If you haven't tried it, you'll think I'm crazy, but it's amazing how fast computers can recognize individual letter names. You can just blurt them all out. I discovered this while entering a domain name by voice - just spell it out and poof, no problem, no corrections.
Not sure I'd want to spend all day doing it that way, but rather than fighting with voice recognition for misunderstood homonyms, just fall back on individual keys.
I had my second surgery just a few months ago. I can type again, but each time I had to use my left hand for months. Initially, it feels like your brain doesn't function properly anymore (not mentioning the psychological effort you have to make in order to be focused on work when you feel your hands are falling apart). Keyboard speed is directly related to how fast you can move your hands to support your thought flow. I tried Kinesis and even a split vertical keyboard (KeyboardIo) but none avoided the pain and numbness that came with typing. The other problem with thumb-cluster keyboards is that your IDE productivity goes to zero. I was faster with just my left hand on a regular keyboard than with both. I think this would be fixable with a good amount of time remapping shortcuts, etc. Now that my hand works again, I think I should start spending time getting used to my KeyboardIO and at least try to buy some time.
The "voice coding" space is maybe not a mess, but far from great or even acceptable. However, there seem to be more recent efforts to make better tools. I would definitely check https://serenade.ai/ out.
The main problem, I think is that "voice coding" is too much focused on editor typing which they can't do right as, when combined with code syntax, it becomes too complex. Instead, they should focus on higher level actions (which btw, Serenade does) along with a different approach to typing. I think Vim is a good example of where editing should be. IntelliJ refactoring is where voice coding should start. With all the AI buzz, it's unbelievable how bad voice recognition is. I'm not talking about "Siri set an alarm", but instead separating context from tone, not having to say things 2-3 times having good response latency, etc.
Lastly, I wish there was simple voice assistance for code navigation - like go to definition, find usages, etc. This is much simpler to "parse" than code structure. Unfortunately, this is not even tackled by any tool as far as I've seen.
I’m one of the creators of Serenade—thanks for mentioning us! We totally agree about the need for higher-level layers of abstraction, and we’re working on some of the code navigation functionality you mentioned right now. If you have any other ideas or feedback, we’d love to chat more, I’m [email protected].
“I type fast enough that I have never had much use for snippets in my text editor. Perhaps if I’d used them more effectively, I would not have developed the RSI symptoms that I have!"
Often we have discussions about advanced code completion on HN. Many developers feel they don’t need it, or that it gets in their way, for example.
Reading stories like this convinces me even more that our editors (tools) need to be smarter. There is so much repetition is coding, it’s hard to believe we can’t do better.
I like the idea of TabNine, unfortunately it doesn't seem to be super well supported :(
There are no news on whether it is being actively developed and the current implementation is unusable in a corporate environment because it can't dial home through a proxy so it refuses to activate the license.
It's a shame too because it was basically a "shut up and take my money" reaction from me. I'd pay for this product. I'd pay good money for this product.
I am relatively sure that you could tell that my keyboard is primarily used for coding C# in Visual Studio just by looking at which keys are the most worn and which carry a significant layer of dust on them.
The keys that complete an Intellisense selection (space, period, semi-colon, Enter) are nearly ground down to nubs, and the open-paren and open-bracket keys are worn smooth, while the corresponding close keys indicate near complete disuse. Similarly for F5 (Start Debugger) and F10 (Step Over) compared to the rest of the F-keys.
We should use better languages. Better understood and integrated with the editor and better at expressing every level of abstraction. coughpretty much any lispcough
I've been voice coding for about 5 years now. For those of you not on windows, I use talon voice on mac (linux version is in beta). It works quite well and I'm at least as productive writing code by voice than I ever was by hand. I was someone who would spend the time to get my emacs and then later vim configs highly optimized, but there is something liberating about not constraining yourself to key bindings. I used to type gcC to comment a python class in vim, now I say comment class. For commands you type frequently to get into muscle memory this isn't a huge gain, but for all the things you don't use regularly, it's so much easier for me to remember normal words than keyboard shortcuts.
At this point all of the projects mentioned in this thread (caster/talon/serenade) have some option for supporting the three main (win/lin/mac) platforms.
FYI, I thought my programming career was over due to RSI.
Now, I only type while wearing long-sleeves.
And of course, I still have to take regular breaks.
I no longer suffer RSI symptoms. I'm guessing because it increases blood flow to the area and perhaps the warmth helps keep ligaments and muscles flexible and loose.
Greetings Everyone, I help maintain the Caster project. The key difference from other solutions out there as we seek to support a completely open source voice coding stack. Open source is only way to go long term if you're going to being using a tool for most of your life. Fortunately for some it acts as a bridge until their RSI symptoms becomes manageable or goes into remission.
We are working towards cross-platform support Linux and Mac as well as adding support for Kaldi. Dragonfly is already cross-platform so just a few windows specific functions to be ported yet in Caster.
I said this in another comment, but it can't be emphasized enough: I created Kaldi Active Grammar because I didn't trust relying on closed source software for something so crucial to my productivity, where a decision by an outside party determines whether I can function. As a bonus, open source means I can make it work better to fit my needs than closed source ever could.
For what it's worth, my voice is quite abnormal, so most untrained speech recognition is terrible for me, and even performing the normal "training" for Dragon still resulted in very poor accuracy. However, apparently their training is quite limited, because once I developed Kaldi Active Grammar, and did my own direct training, the results were fantastic in comparison, with orders of magnitude better accuracy.
It's still surprising to see Dragon Speech Recognition as the recommended (and only) choice here.
Is anyone working on decent speech recognition for Mac/Linux or know good resources for that? The ideal output is a stream of what could have been said, as well as some alternatives, each with a confidence.
Every alternative I've tried has not been as effective as the version of Dragon I used from 2011. I think the focus on accents and training is a big thing here -- I'm happy to spend a couple hours training it for better results.
Off topic, but I really wish to find out any Alexa-like "smart speakers" capable of voice programming.
For example:
1. I would like to command the speaker listen for a keyword like the Fizz Buzz Test[1] if I counted to certain number.
2. Ask the speaker to remind me of something when hearing certain topics during a conversation. Much like the "if" keyword in text based computer programming languages.
3. Program a poem into the speaker over the microphone, tutor my kids to memorize it, correct the wrong parts. Share the snippet to other parents. program simple home made riddles and
tests over voice.
4. The ability to store certain list/map structure as global variables. e.g. asking the speaker, who is the second oldest son in this family? Who got up first this morning?
5. Voice memos and search engine. Stored and indexed securely offline on my home NAS.
I think all the big smart speaker makers would tell you to just write a serverless function and hook it up to your speaker. It's unlikely they would ever create such a functionality.
Most likely if you want to make it work you'd either have to build your own smart speaker or make a serverless function that used one of the other voice programming programs mentioned in this thread as it's backend.
I would probably start with the seeed studio ReSpeaker array hardware wise. Otherwise, can you write up a more specific use case list with specific commands and responses? This sounds fun and I can probably help you make this happen.
You gotta be careful not to get RSI in your vocal chords. I almost did when I tried voice coding years ago. For me at least, it tends to shift to whenever “work” is being done. Same thing happened when I experimented with eye tracking.
I've had some close calls with RSI and the most helpful things for me were:
1) Getting an ergo keyboard, in my case the Microsoft Sculpt
2) Remapping my keys to better match my workflow. Left and right parens are mapped to left and right shift - same for Ctrl-Braces and Alt for brackets. Mapping Caps lock to delete one word back was also a big one. Further, I have the number pad on the left to both make using the mouse require less movement in addition to remapping all the numpad keys to useful programming commands.
A minor tip for people fiddling with keyboards/layout. Be careful of regular pinky use for modifiers. Pinkies are weak and overuse can cause ulnar issues.
Corollary, if you have ulnar (pinky side) issues in your forearm, reduce pinky use.
Just a quick shout out to Microsoft (no affiliation) - Their Sculpt keyboard has taken all the discomfort out of 15hour coding sessions. If you're a professional developer, get one.
The Sculpt and Kensington expert trackball (big trackball, not thumb trackball), while not fixing my issues completely, did improve things for me quite a bit.
That said, barring genetic lottery, as the sibling said: enough 15 hour sessions are going to give you RSI no matter what you’re typing on. There’s no way to do that every day, and exercise, take breaks, and sleep enough. Balance is important.
I've been using the Microsoft Natural Ergo for 20+ years. It's the very first thing I ask for when I get a new job, or if they don't supply keyboards, the first thing I bring from home.
I've never had any RSI type symptoms or even fatigue after long typing sessions.
The sculpt seems to just be a fancier wireless version of the same thing (although I haven't tried it so I could be wrong).
For those on Linux, I've been working on a Talon inspired voice coding program called Osprey that uses the Google Cloud speech to text API: https://github.com/osprey-voice/osprey.
It's still very much a work in progress but it's already been working very well for me and I'm actually using it to type out this response right now.
You can solve most carpal tunnel issues by massaging your forearms with a ball (or your knuckles).
I know it sounds crazy, but I solved a very intense bout of plantar fasciitis by massaging my calves. Took some time, but eventually the pain went away.
And when I had carpal tunnel, I did the same (although not learning heavily on my wrists helped a lot too).
You'll know if you're hitting the right spot because it'll hurt. A lot.
There’s recently Common Voice from Mozilla, which is a huge free English dataset (1500 hours and growing), and wer_are_we [1] has shown really impressive accuracy increases in published research the past few years. Exciting times.
I suspect this setup would not be ideal as background noise getting worse would be annoying for all concerned and even more challenging than an open plan office already is, and your colleagues might not appreciate this unless you're already surrounded by people talking all day (e.g. support or sales teams on calls).
That aside, in terms of worrying about your mic picking up other people's voices and the voice dictation getting confused, most dedicated microphones these days (i.e. not ones that are built into your phone's headphones), are pretty good at background noise reduction.
I've not used the one OP recommends - I'd never have considered a table based mic like that before - but the noise reduction on the Plantronics Blackwire 3215 headset I use is so good that if I move the mic boom a few inches up or down away from my mouth, people can't really hear me on calls. It's superb at getting rid of background noises, and if somebody else was in my home office using voice dictation it would not be picked up by my headset.
I use Talon with Dragon and an Audio Technica PRO8HE.
I mostly whisper to the system, and people in my office say they don't hear me at all. Also whispering allows me to not get my throat get tired.
There are really good mics at background rejection, if you’re worried about external voices filtering in.
1. Cheapest is probably a USB dynamic mic of some kind.
2. Next is a Stenomask at around $250
3. A lot of folks swear by the DPA d:fine cardioid, which is $800-1300 including an interface. There’s also cardioid shirt worn lavalier I’m interested in trying sometime, which is the same interface but the mic is $150 cheaper ($650 -> $500)
If you’re worried about other people hearing you, your options include an isolated area, playing noise (white noise or music?), or using a StenoMask, which blocks sound in both directions.
Remember in the US your employer is required under the ADA to provide “reasonable accommodations” for disability, which may include a private working space, pair programming, or letting you work from home more often.
[+] [-] lunixbochs|6 years ago|reply
There are at least two cross-platform projects where the biggest expense is a microphone instead of software.
1. My project, Talon. Windows/Linux/Mac support, and a first party local speech recognition engine that is pretty good and getting better. It’s free, but the engine is in a private beta (which is $15/mo to support development, optional if there’s a financial issue).
2. Serenade. They are VC backed. Currently free, unsure about their longer term plans. They use cloud based recognition.
[+] [-] pumanoir|6 years ago|reply
[+] [-] daanzu|6 years ago|reply
I created Kaldi Active Grammar because I didn't trust relying on closed source software for something so crucial to my productivity, where a decision by an outside party determines whether I can function. As a bonus, open source means I can make it work better to fit my needs than closed source ever could.
Furthermore, the original article mentions Caster (which is built on Dragonfly), but doesn't mention that KaldiAG works with it, and that work is underway to expand Caster's platform support.
[1] https://github.com/daanzu/kaldi-active-grammar
[2] https://youtu.be/Qk1mGbIJx3s
[+] [-] tucosan|6 years ago|reply
[1]: https://talonvoice.com/
[+] [-] iudqnolq|6 years ago|reply
(For example, I suspect the microphone their school supplies them with may be no good)
[+] [-] _-___________-_|6 years ago|reply
[+] [-] GordonS|6 years ago|reply
I've been waiting to try Talon for ages - glad I can give it a try now.
[+] [-] tucosan|6 years ago|reply
[+] [-] jorgeer|6 years ago|reply
But then, after having seen doctors and neurologists, and finally a physical therapist, I came across my salvation: - Exercising my hands.
I very rarely see this mentioned for some reason. I exercised regularly, but only the bigger muscle groups, rarely grip strength and wrist strength. It felt counter-intuitive to exercise my already extremely painfully aching hands (when typing), but using grip weights and other methods to work out my hands and wrists, the pain went away quickly! If you are not diagnosed with carpal tunnel, and not already doing this, definitely try it, it saved my career.
[+] [-] benterris|6 years ago|reply
[+] [-] modeless|6 years ago|reply
[+] [-] eurg|6 years ago|reply
It is frustrating that there seems to be only trial and error in all of this.
[+] [-] russellbeattie|6 years ago|reply
If you haven't tried it, you'll think I'm crazy, but it's amazing how fast computers can recognize individual letter names. You can just blurt them all out. I discovered this while entering a domain name by voice - just spell it out and poof, no problem, no corrections.
Not sure I'd want to spend all day doing it that way, but rather than fighting with voice recognition for misunderstood homonyms, just fall back on individual keys.
[+] [-] random3|6 years ago|reply
The "voice coding" space is maybe not a mess, but far from great or even acceptable. However, there seem to be more recent efforts to make better tools. I would definitely check https://serenade.ai/ out.
The main problem, I think is that "voice coding" is too much focused on editor typing which they can't do right as, when combined with code syntax, it becomes too complex. Instead, they should focus on higher level actions (which btw, Serenade does) along with a different approach to typing. I think Vim is a good example of where editing should be. IntelliJ refactoring is where voice coding should start. With all the AI buzz, it's unbelievable how bad voice recognition is. I'm not talking about "Siri set an alarm", but instead separating context from tone, not having to say things 2-3 times having good response latency, etc.
Lastly, I wish there was simple voice assistance for code navigation - like go to definition, find usages, etc. This is much simpler to "parse" than code structure. Unfortunately, this is not even tackled by any tool as far as I've seen.
[+] [-] mwiethoff|6 years ago|reply
[+] [-] melling|6 years ago|reply
Often we have discussions about advanced code completion on HN. Many developers feel they don’t need it, or that it gets in their way, for example.
Reading stories like this convinces me even more that our editors (tools) need to be smarter. There is so much repetition is coding, it’s hard to believe we can’t do better.
This tool is often mentioned: https://tabnine.com/
[+] [-] lunixbochs|6 years ago|reply
[+] [-] est|6 years ago|reply
https://kite.com/integrations/kite-vs-tabnine/
[+] [-] short_sells_poo|6 years ago|reply
There are no news on whether it is being actively developed and the current implementation is unusable in a corporate environment because it can't dial home through a proxy so it refuses to activate the license.
It's a shame too because it was basically a "shut up and take my money" reaction from me. I'd pay for this product. I'd pay good money for this product.
[+] [-] thrower123|6 years ago|reply
The keys that complete an Intellisense selection (space, period, semi-colon, Enter) are nearly ground down to nubs, and the open-paren and open-bracket keys are worn smooth, while the corresponding close keys indicate near complete disuse. Similarly for F5 (Start Debugger) and F10 (Step Over) compared to the rest of the F-keys.
[+] [-] keymone|6 years ago|reply
[+] [-] busterarm|6 years ago|reply
Fingers crossed.
[+] [-] dwiel|6 years ago|reply
[+] [-] lunixbochs|6 years ago|reply
[+] [-] ijidak|6 years ago|reply
Now, I only type while wearing long-sleeves.
And of course, I still have to take regular breaks.
I no longer suffer RSI symptoms. I'm guessing because it increases blood flow to the area and perhaps the warmth helps keep ligaments and muscles flexible and loose.
Simple solution, but took a while to figure out.
Hopefully this helps someone reading this.
[+] [-] LexiconCode|6 years ago|reply
We are working towards cross-platform support Linux and Mac as well as adding support for Kaldi. Dragonfly is already cross-platform so just a few windows specific functions to be ported yet in Caster.
Kaldi via daanzu's kaldi active grammar. https://github.com/daanzu/kaldi-active-grammar
Talon may be free but is closed sourced.
[+] [-] daanzu|6 years ago|reply
For what it's worth, my voice is quite abnormal, so most untrained speech recognition is terrible for me, and even performing the normal "training" for Dragon still resulted in very poor accuracy. However, apparently their training is quite limited, because once I developed Kaldi Active Grammar, and did my own direct training, the results were fantastic in comparison, with orders of magnitude better accuracy.
Open source is what allows this.
[+] [-] pfraze|6 years ago|reply
https://www.youtube.com/watch?v=fBhBqlQj00Q
https://www.youtube.com/watch?v=hGPNs5C1Lp0
[+] [-] qixxiq|6 years ago|reply
Is anyone working on decent speech recognition for Mac/Linux or know good resources for that? The ideal output is a stream of what could have been said, as well as some alternatives, each with a confidence.
Every alternative I've tried has not been as effective as the version of Dragon I used from 2011. I think the focus on accents and training is a big thing here -- I'm happy to spend a couple hours training it for better results.
[+] [-] totalthrowaway|6 years ago|reply
https://www.youtube.com/watch?v=Mz3JeYfBTcY
[+] [-] est|6 years ago|reply
For example:
1. I would like to command the speaker listen for a keyword like the Fizz Buzz Test[1] if I counted to certain number.
2. Ask the speaker to remind me of something when hearing certain topics during a conversation. Much like the "if" keyword in text based computer programming languages.
3. Program a poem into the speaker over the microphone, tutor my kids to memorize it, correct the wrong parts. Share the snippet to other parents. program simple home made riddles and tests over voice.
4. The ability to store certain list/map structure as global variables. e.g. asking the speaker, who is the second oldest son in this family? Who got up first this morning?
5. Voice memos and search engine. Stored and indexed securely offline on my home NAS.
[1]: https://wiki.c2.com/?FizzBuzzTest
[+] [-] jedberg|6 years ago|reply
Most likely if you want to make it work you'd either have to build your own smart speaker or make a serverless function that used one of the other voice programming programs mentioned in this thread as it's backend.
[+] [-] lunixbochs|6 years ago|reply
[+] [-] azinman2|6 years ago|reply
[+] [-] disease|6 years ago|reply
1) Getting an ergo keyboard, in my case the Microsoft Sculpt 2) Remapping my keys to better match my workflow. Left and right parens are mapped to left and right shift - same for Ctrl-Braces and Alt for brackets. Mapping Caps lock to delete one word back was also a big one. Further, I have the number pad on the left to both make using the mouse require less movement in addition to remapping all the numpad keys to useful programming commands.
[+] [-] lunixbochs|6 years ago|reply
Corollary, if you have ulnar (pinky side) issues in your forearm, reduce pinky use.
[+] [-] lbj|6 years ago|reply
[+] [-] yreg|6 years ago|reply
[+] [-] lunixbochs|6 years ago|reply
That said, barring genetic lottery, as the sibling said: enough 15 hour sessions are going to give you RSI no matter what you’re typing on. There’s no way to do that every day, and exercise, take breaks, and sleep enough. Balance is important.
[+] [-] war1025|6 years ago|reply
I can't imagine working 15 hours straight would be comfortable for me no matter what tools I was using.
[+] [-] jedberg|6 years ago|reply
I've never had any RSI type symptoms or even fatigue after long typing sessions.
The sculpt seems to just be a fancier wireless version of the same thing (although I haven't tried it so I could be wrong).
[+] [-] cjbassi|6 years ago|reply
It's still very much a work in progress but it's already been working very well for me and I'm actually using it to type out this response right now.
[+] [-] DantesKite|6 years ago|reply
I know it sounds crazy, but I solved a very intense bout of plantar fasciitis by massaging my calves. Took some time, but eventually the pain went away.
And when I had carpal tunnel, I did the same (although not learning heavily on my wrists helped a lot too).
You'll know if you're hitting the right spot because it'll hurt. A lot.
[+] [-] dotasm|6 years ago|reply
[+] [-] lunixbochs|6 years ago|reply
[1] https://github.com/syhw/wer_are_we
[+] [-] mikob|6 years ago|reply
Rest is ultimately the best way to prevent hand injuries and since I spend most of my time in Chrome, this extension lets me do it hands free.
[+] [-] rijoja|6 years ago|reply
[+] [-] PaulRobinson|6 years ago|reply
That aside, in terms of worrying about your mic picking up other people's voices and the voice dictation getting confused, most dedicated microphones these days (i.e. not ones that are built into your phone's headphones), are pretty good at background noise reduction.
I've not used the one OP recommends - I'd never have considered a table based mic like that before - but the noise reduction on the Plantronics Blackwire 3215 headset I use is so good that if I move the mic boom a few inches up or down away from my mouth, people can't really hear me on calls. It's superb at getting rid of background noises, and if somebody else was in my home office using voice dictation it would not be picked up by my headset.
[+] [-] pumanoir|6 years ago|reply
[+] [-] lunixbochs|6 years ago|reply
1. Cheapest is probably a USB dynamic mic of some kind.
2. Next is a Stenomask at around $250
3. A lot of folks swear by the DPA d:fine cardioid, which is $800-1300 including an interface. There’s also cardioid shirt worn lavalier I’m interested in trying sometime, which is the same interface but the mic is $150 cheaper ($650 -> $500)
If you’re worried about other people hearing you, your options include an isolated area, playing noise (white noise or music?), or using a StenoMask, which blocks sound in both directions.
Remember in the US your employer is required under the ADA to provide “reasonable accommodations” for disability, which may include a private working space, pair programming, or letting you work from home more often.