Any plans for other languages and locales? I immediately noticed the temperature in F in the example about the weather in Lima. I think everybody there uses C with the exception of American tourists :-) Seriously, it looks a great product. Maybe it returns even too much data in the JSON. I wonder how to take advantage of all of that if I don't know what people are going to ask. They're going to ask silly questions just for fun even if I have a vertical app (example: a mortgage calculator), because this is not a web form with constrained input fields but a free form input. The numbers I get into the answer could be unrelated to mortgages. Do you have examples of best practices? Maybe just write and speak the answer? Thanks.
Nice observation. Sadly, localization is an afterthought for a lot of developers. I am also curious to see how they handle other languages and locales, since I'm interested in learning how to use these kinds of systems.
The video is only 240p and quite shaky. As it is published by the SoundHound Inc. company, is this a marketing technique to make it look more amateurish?
Such a low latency means the demo was done over Wifi in the SoundHound building - especially if the speech recognition runs on the server side. Or which speech recognition software does that demo app use? Nuance software based on the client? Android 5 voice recognition isn't that fast.
After owning Echo, Roku and Fire TV, I'm super-bullish on voice commands finally being ready for prime time. It's a terrific interface for home audio, TV and car audio.
I've gotta think Apple will open up Siri to app developers sooner than later.
Definitely. I've been using voice commands in Android for about 5 years now (since ~2010) and I've consistently been shocked at how incredibly efficient an interface it is. The number of capabilities hooked up to voice control has only been increasing since then and it's been great.
I think voice with a screen is interesting, but voice alone can be difficult. What is the last voice controlled IVR (phone system) that was awesome to interact with. I think it takes a combination of voice and something can can be confirmed with another "button", or something you can touch or push to confirm or cancel what you've "asked" it to do.
I think it can augment things well, but not be the prime time star.
It's already really easy to get fast, efficient access to large data sets. I don't see much value in that. It is not fast,efficient, and easy to transform natural language queries into computationally actionable ones.
I would find more value as a developer if, when given a natural language query, it returned a structured query. Then I could tweak the query to conform to whatever data retrieval API I wanted.
I don't think what I'm asking for has to be mutually exclusive with what they're currently offering. Give me the option to have houndify do some or all of the work for me.
I am one of the developers for houndify.com, so I can answer this question for you!
We actually have an api endpoint dedicated to doing this for you. At the moment we have a concept of "domains" where developers use a proprietary language to help Hound understand topics. Using our api, you could technically do this yourself, and add functionality that doesn't currently exist on the platform.
You could use the hotel domain and get back a ton of pre-formatted data, or you could just get back speech-to-text, or you could specify hooks you want to take action on. I'm not a developer on the actual voice api itself, so I'm not the most informed, but perhaps that answers your question?
I've been using pocketsphinx with this neat Ruby gem[1]. It's really easy to use but has low accuracy (understands me correctly maybe half the time). I'm curious to see if Houndify does any better!
There is clearly a knowledge graph coupled with this in addition to the speech recognition. Sorry, "meaning" recognition. I feel like there is an opportunity to connect the deep knowledge graph of Wolfram Alpha -- or that maybe Wolfram missed the ball by not connecting their graph in a more usable way.
I wonder if it is based on Freebase.com knowledge graph, which Google discontinued last month. http://www.freebase.com/ (and recently IBM has bought Blekko web search and knowledge graph engine as a replacement for Freebase to power their IBM Watson)
Does this require a network connection? I'd love to start adding speech-to-text interfaces to my apps, but most of the stuff I work on needs to be able to work without the network, and most of the speech-to-text engines these days are SaaS products in some form or another.
its the complexity of the queries, and the contextual awareness that makes it impressive. But yes, my immediate thought was either Android s-to-t or google speech api plugged into wolfram alpha might create a (much simpler, but also much easier) version of this.
[+] [-] pmontra|10 years ago|reply
[+] [-] transpy|10 years ago|reply
[+] [-] dshankar|10 years ago|reply
That's insanely fast, compound natural language queries. I'm impressed.
[+] [-] frik|10 years ago|reply
Such a low latency means the demo was done over Wifi in the SoundHound building - especially if the speech recognition runs on the server side. Or which speech recognition software does that demo app use? Nuance software based on the client? Android 5 voice recognition isn't that fast.
[+] [-] pmontra|10 years ago|reply
[+] [-] joelrunyon|10 years ago|reply
[+] [-] pbreit|10 years ago|reply
I've gotta think Apple will open up Siri to app developers sooner than later.
Houndify looks interesting.
[+] [-] wutbrodo|10 years ago|reply
[+] [-] doragcoder|10 years ago|reply
I think it can augment things well, but not be the prime time star.
[+] [-] pjc50|10 years ago|reply
[+] [-] dexterdog|10 years ago|reply
[+] [-] alistproducer2|10 years ago|reply
I would find more value as a developer if, when given a natural language query, it returned a structured query. Then I could tweak the query to conform to whatever data retrieval API I wanted.
I don't think what I'm asking for has to be mutually exclusive with what they're currently offering. Give me the option to have houndify do some or all of the work for me.
[+] [-] iamcasen|10 years ago|reply
We actually have an api endpoint dedicated to doing this for you. At the moment we have a concept of "domains" where developers use a proprietary language to help Hound understand topics. Using our api, you could technically do this yourself, and add functionality that doesn't currently exist on the platform.
You could use the hotel domain and get back a ton of pre-formatted data, or you could just get back speech-to-text, or you could specify hooks you want to take action on. I'm not a developer on the actual voice api itself, so I'm not the most informed, but perhaps that answers your question?
[+] [-] dang|10 years ago|reply
[+] [-] egonschiele|10 years ago|reply
[1] https://github.com/watsonbox/pocketsphinx-ruby
[+] [-] johnm1019|10 years ago|reply
[+] [-] frik|10 years ago|reply
[+] [-] davedx|10 years ago|reply
[+] [-] ilaksh|10 years ago|reply
[+] [-] moron4hire|10 years ago|reply
[+] [-] speechduh|10 years ago|reply
[+] [-] philjackson|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] diminish|10 years ago|reply
[+] [-] Freeboots|10 years ago|reply
[+] [-] thejosh|10 years ago|reply
Bug: When scrolling down the page it is very very sluggish, using Chrome on Xubuntu 15.04.
[+] [-] hobonumber1|10 years ago|reply
[+] [-] hobonumber1|10 years ago|reply
[+] [-] Donald|10 years ago|reply
[+] [-] iamcasen|10 years ago|reply
[+] [-] amelius|10 years ago|reply
[+] [-] cscharenberg|10 years ago|reply
[+] [-] ilaksh|10 years ago|reply