top | item 20951912

Ask HN: Open-source voice assistants like Siri? Or can I build one on my own?

51 points| mettamage | 6 years ago | reply

For a data sensitive home grown project I want to integrate an open source voice assistant or roll my own.

Do you guys have any tips or experience with this and how to get started? I expect there to be so me gotcha’s that I am unaware of.

The voice assistant does not need to be perfect but does need to be good. It should capture at least 80% of what I say in formal non-slang English correctly. I want to be able to speak in sentences, like I do with Siri.

What would your approach be to build or integrate this? Is it even feasible?

I am willing to invest one to two months full-time on learning the required machine learning. I currently know basic neural nets (Michael Nielsen’s book), basic statistics (e.g. logistic regression) and basic machine learning (SVM, Knn, PCA, random forest, decision trees, bag of words).

17 comments

order
[+] synesthesiam|6 years ago|reply
You might be interested in trying Rhasspy: https://github.com/synesthesiam/rhasspy

Rhasspy lets you describe the set of sentences you want to speak using a simple grammar with annotations for named entities (https://rhasspy.readthedocs.io/en/latest/training/#sentences...). It outputs JSON over HTTP/Websockets/MQTT, so it works well with NodeRED, Home Assistant, etc.

Disclaimer: I created and maintain Rhasspy.

[+] nmstoker|6 years ago|reply
That looks really impressive, integrating a number of open source tools with a simple but nice UI.

Do you have any measures of how well it recognises spoken commands?

And have you seen anyone using it with non-American accents for English? (I ask as it relies on the CMU dictionary and tools I've seen use it tend to struggle with other accents, understandably)

[+] nmstoker|6 years ago|reply
This is definitely feasible, it'll depend on how much you are happy to go for tools like those mentioned here or if you want to roll your own (which I expect would be enjoyable but much slower)

If your roll your own, you'll probably still want to reuse existing components for wake words, ASR and TTS, simply training them for your specific needs.

One tip: aim for higher than that 80% target - the sentence you mention that in had 16 words, so you'd expect 3.2 errors if you read that, which will quickly get annoying (it could throw your intent recognition off completely). If you've got the ability to restrict it to a narrow vocab then you can train a language model just with the minimal words needed and that should help the word error rate dramatically.

[+] beshrkayali|6 years ago|reply
I tried a few of the open source available assistant-like systems on my RaspberryPi and found that none of them is even half as good as Google's or Apple's, which is expected.

After thinking about it though, I found that I don't need the voice recognition at all really. What I really wanted is a device that can help me do a few things well, mainly for my case, listen to the radio, announce calendar events, and train time (Stockholm in my case), so I just built that into a raspberry pi with a tiny screen and a few buttons. This little device is more than enough for my case.

[+] kleer001|6 years ago|reply
After the initial excitement over the last six months with my new phone and new assistant I found I was using it less and less. It just takes so much longer than pressing the buttons and is far less accurate with complex things. But then again I haven't been trained on what it can do.
[+] eftokay83|6 years ago|reply
Could you go into more detail on this? Some pictures, maybe a blog post? :)
[+] mettamage|6 years ago|reply
Hmm... fair point. I was afraid this might be the thing.
[+] digital_voodoo|6 years ago|reply
I'm open-source oriented, and others have mentioned Mycroft so I'd add Leon AI https://getleon.ai/
[+] ginger_beer_m|6 years ago|reply
It's useless because the core machine learning part is implemented in JavaScript (nodejs). Problem is most of the state-of-the-art codes out there are done in python.
[+] rotorblade|6 years ago|reply
I have been wondering something similar. Just a simple voice-command thing. Not a general purpose, any voice, assistant, but just to be able to say a word 10 times to train (form some average) and then assign a command to that word. Something like CellWriter does for handwriting but for voice. Does this exists?
[+] beamatronic|6 years ago|reply
There was a voice controlled toy robot, Verbot, in the 1980s. Surely we can improve on that.
[+] sohodlers|6 years ago|reply
I heard there are some project in GitHub. You may check them out..