It was so confusing figuring out what this service is supposed to do. Had to look up the documentation. In summary, from what I can gather
1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).
2. Looks like they do some form of text normalization/correction, again not clear how they do it.
3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
Given this service was for developers with an interest in NLP, it would have been good if they didn’t hide behind a snow job title like “Siri as a service”.
> 1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).
Currently most Wit users use Google or Nuance with great success. You can even use Android's offline speech rec.
That being said, CMU and OpenEars work well, as long as you provide them with good language models (which you can't do if you hack a quick project). Our plan is for Wit to automatically generate the right language models from your instance configuration.
> 2. Looks like they do some form of text normalization/correction, again not clear how they do it.
3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
We abstract the full NLP stack for the developer. How we do it is not really what matters to our developers, as long as it works :) Actually we use a combination of many different NLP and machine learning techniques.
> 4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
You don't need to train hundreds of examples. Plus, our users are not NLP/ML experts and they prefer a graphical UI. But that's true it could be still more efficient, we have good features in the roadmap for that :)
Alchemy is great as a set of NLP tools, some of them quite academical, but it's not designed from scratch to solve the problem we're trying to solve: enable the masses of developers to easily add a natural language interface to their app.
Maybe I'm out of line, but if you're planning on tearing into what's wrong with something, try to offer something positive as well. Your feedback was great and constructive, it's just not very nice.
I'll be honest, HN has a tendency (myself included) to have a first natural reaction of "how can I criticize this?" But just because something isn't faster than enterprise, or not-as-scalable, or not made in your framework of choice doesn't mean it's worthless. I think this project is amazing. Great job and I can't wait to see this mature.
Is there any way to view this page with the effects turned off? With all the text constantly appearing and disappearing, I haven't yet made it to the end of a sentence, and therefore can't form an opinion about it.
I think there was a picture of a robot on the screen for a few seconds, but that's all I remember.
Same here. I find "siri as a service" an interesting project. But not interesting enough to cope with a page that blends in and out content and makes my head spin.
You should make it clearer that you don't actually handle voice recognition. When I read: "Developers use Wit to easily build a voice interface for their app." I expect you to handle things from start to finish.
Also, let me try it! It's frustrating because the UI looks like you can experiment but it's only an animated demo (or am I missing something??) In particular the mic logo is used to record on Google and here it doesn't seem to do anything?
> You should make it clearer that you don't actually handle voice recognition.
You're right, we'll make it more clear on the landing page. A full out-of-the-box integration with some voice recognition engines (we love CMU Sphinx, open source) is in our roadmap.
> Also, let me try it!
We purposely didn't provide a "end-user" demo (something that would look like chatting with Siri) because we want to focus first on the developer experience, when they configure Wit to understand their very own end-users intents. You can require an invite and try this in less than 5 minutes.
The word "Siri" doesn't belong in the title or the article, unless a Trabant advertisement has the right to mention Mercedes-Benz in its promotional text. The project does a primitive kind of voice recognition, but it doesn't use Siri.
On this topic, I invite people to try out my non-prototype, non-project toy that uses Google's support for HTML5 speech recognition. It's pretty funny how wrong things go when you try to say something even a bit out of the ordinary:
If I say, "Now is the time for all good men to come to the aid of their country," an old teletype test sentence, the Google recognizer always nails it. If I say, "I hit an uncharted rock and my boat is being repaired," things go hilariously wrong, and every time differently.
I like the concept a lot. I'm going to have to read more about it. One thing that I'm unclear about is if this does voice->text, or if the developer does that and Wit handles translation of that into actions.
Just a heads up, but Get Started on the pricing page does nothing. It's natural progression for me to go home page->pricing->OK, looks good, let's get started.
Wit takes the output of the voice recognition engine as input. It's quite robust to voice recognition errors. Most devs use Google's engine or the open source CMU Sphinx engine.
This is amazingly timely for me, I've been building my own version of jarvis using speakeasy-nlp (a node NLP library) and Chrome's builtin support for HTML5 webkitSpeechRecognition:
Hey everyone. Wit guy here. We've been working on Wit the past few months and we think it's time to get your feedback. I'm happy to answer any questions you have.
Bringing Natural Language Understanding to the masses of developers is hard and we still have a lot of work ahead of us. Please don't hesitate to reach out to us!
Nice concept, I just came back on here to let you know that I don't know what is happening on that page but I left it open about 45 minutes ago and noticed my fan kicked in a lot. It was that page I left open. Ended up taking 25% CPU, you didn't work on the iTunes software did you??
Interesting! Nothing happened after I registered with my Github account though using Opera.
I also wonder how this compares to http://www.maluuba.com/ ?
Wit is 100% open and flexible, you can create any intent you need for your app, you're not limited to a static set of domains/actions.
EDIT: @ragebol we are very interested in ROS and robotics, don't hesitate to get in touch with me arthur at wit dot ai. In the future we would like to provide an off-the-shelf human/robot communication module for developers.
Your message isn't clear. AFAIK there is no official way to interact with Siri or Google voice rec.
It seems like WIT will take the text that has already been translated from a user's voice to text and make it easily accessible to my application but how does WIT access the text generated from a Siri request in the first place for example? Does WIT have some other way of getting at this data that has already been converted from voice to text by Siri or Google or some other speech-to-text engine?
> AFAIK there is no official way to interact with Siri or Google voice rec.
Actually there are ways. On Android devices, voice rec is available to devs (even offline if the user enabled it!). We have a simple tutorial about how to integrate on Android https://wit.ai/docs/android-tutorial
Right now on iOS you have two options (none of them involves Siri, which is kept closed by Apple):
1/ Do the voice rec server-side (Siri does that)
2/ Use OpenEars to do it client-side
Server-side, you have many voice rec options, including open source CMU Sphinx.
Providing a fully-integrated solution with speech rec out of the box is in our roadmap.
Looks like you focus on search, summary, entity and sentiment extraction with a rule-based approach.
Wit's focus is to power human/machine interfaces, and our priority is to provide developers with a 100% configurable solution, with no prior assumption on their domain. And we don't believe in rules, we chose a machine learning approach.
This would be great for open source projects, but I feel like I would trip over a very large pile of patents if I tried to build a product around it. I don't have any relevant experience myself though, so it's just a feeling.
The pricing model doesn't scale realistically and would require a subscription service for users. An app with 1M+ installs could do 1M+ calls per day making this service $24k / month.
After reading http://octodex.github.com/faq.html, we thought it was okay to put this image given that we advertise and reference GitHub a lot, we heavily integrate with it and we love Octocat!
The landing page looks like a minefield of legal issues. Marketing everything explicitly with the references to Siri is asking for a law suit from Apple.
We share the same vision that voice becomes the key human/machine interface, especially for the upcoming generation of wearable devices, home automation, etc.
I don't know if Ask Ziggy is 100% self-service for the developers. That's a key requirement for us.
[+] [-] npalli|12 years ago|reply
1. It doesn’t do any speech recognition (speech -> text), so not sure why they put Siri in the title. It is also not clear how they can ‘hijack’ the text from Siri to do this analysis. The ASR engines they talk about (CMU, OpenEars) have pretty horrible accuracy (compared to Siri or google voice).
2. Looks like they do some form of text normalization/correction, again not clear how they do it.
3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
This different from alchemy (or many others) because this is open source(?) http://www.alchemyapi.com/products/features/entity-extractio...
Given this service was for developers with an interest in NLP, it would have been good if they didn’t hide behind a snow job title like “Siri as a service”.
[+] [-] ar7hur|12 years ago|reply
Currently most Wit users use Google or Nuance with great success. You can even use Android's offline speech rec.
That being said, CMU and OpenEars work well, as long as you provide them with good language models (which you can't do if you hack a quick project). Our plan is for Wit to automatically generate the right language models from your instance configuration.
> 2. Looks like they do some form of text normalization/correction, again not clear how they do it. 3. The actual service they provide is a form of named entity recognition (confusing named intent which clashes with the android intent mechanism in their examples).
We abstract the full NLP stack for the developer. How we do it is not really what matters to our developers, as long as it works :) Actually we use a combination of many different NLP and machine learning techniques.
> 4. Also they let you define your own entities to match. You can train them using a drop –down menu. Not sure how you can train hundreds of examples using point and click.
You don't need to train hundreds of examples. Plus, our users are not NLP/ML experts and they prefer a graphical UI. But that's true it could be still more efficient, we have good features in the roadmap for that :)
> This different from alchemy (or many others) because this is open source(?) http://www.alchemyapi.com/products/features/entity-extractio....
Alchemy is great as a set of NLP tools, some of them quite academical, but it's not designed from scratch to solve the problem we're trying to solve: enable the masses of developers to easily add a natural language interface to their app.
[+] [-] aquark|12 years ago|reply
I'm running this at home and it works great for adding custom actions to Siri
[+] [-] film42|12 years ago|reply
I'll be honest, HN has a tendency (myself included) to have a first natural reaction of "how can I criticize this?" But just because something isn't faster than enterprise, or not-as-scalable, or not made in your framework of choice doesn't mean it's worthless. I think this project is amazing. Great job and I can't wait to see this mature.
[+] [-] jasonkester|12 years ago|reply
I think there was a picture of a robot on the screen for a few seconds, but that's all I remember.
Would disabling javascript do the trick?
[+] [-] blandinw|12 years ago|reply
EDIT: All animations (except "What we do") should be disabled. Please, email me at [email protected] if you still have issues.
[+] [-] itry|12 years ago|reply
[+] [-] MasterScrat|12 years ago|reply
You should make it clearer that you don't actually handle voice recognition. When I read: "Developers use Wit to easily build a voice interface for their app." I expect you to handle things from start to finish.
Also, let me try it! It's frustrating because the UI looks like you can experiment but it's only an animated demo (or am I missing something??) In particular the mic logo is used to record on Google and here it doesn't seem to do anything?
[+] [-] ar7hur|12 years ago|reply
You're right, we'll make it more clear on the landing page. A full out-of-the-box integration with some voice recognition engines (we love CMU Sphinx, open source) is in our roadmap.
> Also, let me try it!
We purposely didn't provide a "end-user" demo (something that would look like chatting with Siri) because we want to focus first on the developer experience, when they configure Wit to understand their very own end-users intents. You can require an invite and try this in less than 5 minutes.
[+] [-] lutusp|12 years ago|reply
On this topic, I invite people to try out my non-prototype, non-project toy that uses Google's support for HTML5 speech recognition. It's pretty funny how wrong things go when you try to say something even a bit out of the ordinary:
http://arachnoid.com/speech_to_text
If I say, "Now is the time for all good men to come to the aid of their country," an old teletype test sentence, the Google recognizer always nails it. If I say, "I hit an uncharted rock and my boat is being repaired," things go hilariously wrong, and every time differently.
[+] [-] mrosethompson|12 years ago|reply
For instance: "In most cases, before beginning to listen, the browser will ask permission to monitor your microphone."
Came out as: "In Las Cruces, f listen, permission to monitor your microphone."
[+] [-] xauronx|12 years ago|reply
Just a heads up, but Get Started on the pricing page does nothing. It's natural progression for me to go home page->pricing->OK, looks good, let's get started.
[+] [-] ar7hur|12 years ago|reply
Wit takes the output of the voice recognition engine as input. It's quite robust to voice recognition errors. Most devs use Google's engine or the open source CMU Sphinx engine.
Fixed the Get Started link, thanks!
[+] [-] endlessvoid94|12 years ago|reply
https://github.com/dpaola2/jarvis (work in progress)
I absolutely would love a better NLP api. Please let me in!
[+] [-] ar7hur|12 years ago|reply
You should be able to sign up now.
[+] [-] ar7hur|12 years ago|reply
Bringing Natural Language Understanding to the masses of developers is hard and we still have a lot of work ahead of us. Please don't hesitate to reach out to us!
[+] [-] post_break|12 years ago|reply
[+] [-] unknown|12 years ago|reply
[deleted]
[+] [-] chrislomax|12 years ago|reply
Only messing, it was taking a lot of CPU though.
[+] [-] blandinw|12 years ago|reply
Working on a fix now.
[+] [-] MasterScrat|12 years ago|reply
I don't know if people will remember it and be receptive to this touch but I like it.
[+] [-] ragebol|12 years ago|reply
[+] [-] ar7hur|12 years ago|reply
Wit is 100% open and flexible, you can create any intent you need for your app, you're not limited to a static set of domains/actions.
EDIT: @ragebol we are very interested in ROS and robotics, don't hesitate to get in touch with me arthur at wit dot ai. In the future we would like to provide an off-the-shelf human/robot communication module for developers.
[+] [-] blandinw|12 years ago|reply
Can you email me your GitHub username at [email protected] to make sure we got your request? Thanks!
[+] [-] fjabre|12 years ago|reply
It seems like WIT will take the text that has already been translated from a user's voice to text and make it easily accessible to my application but how does WIT access the text generated from a Siri request in the first place for example? Does WIT have some other way of getting at this data that has already been converted from voice to text by Siri or Google or some other speech-to-text engine?
[+] [-] ar7hur|12 years ago|reply
Actually there are ways. On Android devices, voice rec is available to devs (even offline if the user enabled it!). We have a simple tutorial about how to integrate on Android https://wit.ai/docs/android-tutorial
Right now on iOS you have two options (none of them involves Siri, which is kept closed by Apple):
1/ Do the voice rec server-side (Siri does that)
2/ Use OpenEars to do it client-side
Server-side, you have many voice rec options, including open source CMU Sphinx.
Providing a fully-integrated solution with speech rec out of the box is in our roadmap.
[+] [-] dhucerbin|12 years ago|reply
[+] [-] drakaal|12 years ago|reply
Except Stremor has a Query Language so you don't have to do anywhere near as much heavy lifting.
[+] [-] ar7hur|12 years ago|reply
Looks like you focus on search, summary, entity and sentiment extraction with a rule-based approach.
Wit's focus is to power human/machine interfaces, and our priority is to provide developers with a 100% configurable solution, with no prior assumption on their domain. And we don't believe in rules, we chose a machine learning approach.
[+] [-] sinzone|12 years ago|reply
[+] [-] rch|12 years ago|reply
[+] [-] radley|12 years ago|reply
[+] [-] ar7hur|12 years ago|reply
Meanwhile you can decide to share your configuration data and get Wit for free (à la Github) :)
[+] [-] BrandiATMuhkuh|12 years ago|reply
[+] [-] tonydiv|12 years ago|reply
I would be weary of using the Github Octocat mascot though. I believe Octocat is protected under copyright.
[+] [-] blandinw|12 years ago|reply
After reading http://octodex.github.com/faq.html, we thought it was okay to put this image given that we advertise and reference GitHub a lot, we heavily integrate with it and we love Octocat!
Do you think we should remove it?
Thanks!
[+] [-] hackula1|12 years ago|reply
[+] [-] skram|12 years ago|reply
[+] [-] ar7hur|12 years ago|reply
I don't know if Ask Ziggy is 100% self-service for the developers. That's a key requirement for us.
[+] [-] hipaulshi|12 years ago|reply
[+] [-] ar7hur|12 years ago|reply