Some context on the new services. They are built on technology that comes from IBM Research and has been moved into the Watson group in 2014. Some like speech, have been developed for more than 50 years. None of these technologies have overlap with the Watson Jeopardy stack (except for the Watson voice). We will release that stack later this year as a series of services allowing you to build a full Q&A/dialog application.
All the Watson services are still in beta but will start going GA very soon (first one next month). If you have any questions, please fire up, the Watson team is ready to answer.
I find it terribly confusing. It does not explain what instances are, do I need an instance to access some of the services?
I just want to access some services via API from my own servers. I think the documentation is not that good, there should be curl examples at least. For instance, for the STT or TTS include some curl examples.
Does the STT have speaker identification or does it output text in one stream?
Color 71%
Human 67%
Photo 65%
Dog 59%
Person 57%
Placental_Mammal 56%
Animal 50%
Long_Jump 50%
Huh?
This isn't me cherry picking bad results; aside from their demos I'm not finding any photos that are accurately classified. I even tried a headshot of a person isolated on a white background, and Watson told me I uploaded a photo of "shoes".
Seriously - how is this data useful? What could I build with this level of accuracy?
Watson team - do you agree? Is this product about to get a lot better, soon, or is this considered "pretty good"?
The top 3 classes in your example are actually correct - it is a color photo of a human. But we expect it to get much better over time. Only real world usage will allow us to make real improvement - and that's why we are eager to release early.
We are also believe that the first applications (e.g., classifying animals or plants or landmarks in dedicated apps) will have narrower use case that give better accuracy.
The problem with AI systems has almost always been that they tend to be both right and wrong in ways that humans would never be.
Watson gives high confidence to it being a color photo of a human (which is a Person, and an Animal). Which is right. But the only part that a human would ever really care about is that there's another human in the picture.
It gets things wrong with a reasonable confidence for Dog, Placental_Mammal and Long_Jump...importantly, these are wrong in ways that humans would never get wrong.
Just as important are the omissions. A human would probably describe this as a picture of a girl or young woman, laughing or smiling, with curly brown hair wearing a scarf -- and maybe some other incidental information.
Of that description, Watson only got the superclass of one part correct (Human, Person) and didn't provide any of the other parts.
AI fundamentally "thinks" differently than a human, and that makes it hard for humans to use AI as a cognitive enhancement tool in the same way humans use calculators, books, writing, etc. We don't trust what an AI is doing or the answers it provides because for the information it provides, AIs tend to provide right-and-irrelevant, weirdly wrong, or omits obvious and necessary information that a human might use for informational purposes.
If humans ever encounter aliens, it's likely that their mode of thinking will be just as different. So bridging that gap, and figuring out how to make AI like this useful could be a useful endeavor.
I prefer the Watson version voicing a sample paragraph. Both are good enough for an application that selects on price. For a voice-first application, maybe Watson is better for TTS.
For speech to text, Nuance has been the leader, e.g. Apple's Siri. Has anyone compared IBM speech recognition to Nuance, Microsoft & Google?
We know we have strong core speech technology based on various comparisons we have done in the context of competitive evaluations done in conjunction with various government funded speech programs. However, our service is still very new. We could have waited for months to tune it, but our primary goal here is to solicit feedback from the community for how to make our services easier to use, especially in the context of our other platform services. We don't want to wait till the design is so mature that it is impossible to change - so any and all feedback is very welcome!
It is getting increasingly difficult to pick one as the clear leader for "natural sounding". The results are good enough for voicing canned text, and certainly better enunciated than many thick-accented English speakers. Improvements through training can still be made in parsing the text.
For example, IBM Watson interprets "IT" as "it", in the following sentence.
Thank you for calling the IT department.
Vocalware and CereProc correctly parse that.
Who I would really like to hear opinions from are professional voice actors, though they would tend to be understandably leery to lend a hand to improve TTS. Is there a standardized form of writing text that communicates the kind of emphasis, placement of silence and warping of phonemes these actors use in their delivery to concisely convey emotion, that TTS products can adopt?
The text-to-speech is surprisingly good, but I'm amazed at one thing, and not in a good way: the Spanish voice can't pronounce the word "Español". It pronounces it as "Espanol" with a hard "n" sound. In fact, it seems to pronounce all "ñ"s as "n"s. How that kind of an oversight got into the system, I'll never know. Did no one think to check?
Edit: And to add insult to injury, the English voices do pronounce "Español" correctly!
When this was first announced I remember reading about their pricing model where they would take a percentage of app revenue. I'm glad to see they offer flat pay-as-you-go pricing now. Some of the Watson services are intriguing.
I'm on the Watson team and we're interested in learning from developers to make our APIs and documentation easier to use. Have feedback? We'd love to hear it. jsstylos@us.ibm.com Twitter: @jsstylos
The text-to-speech is actually a little nicer than Siri or Cortana, but not groundbreaking. This was the only one of the 5 that I thought did well. The rest might have been better without demo pages.
I tried using Watson a month ago without much success. I wanted to do a classification of some random text, and say that this text for example is this category. But as far as I could understand it only allows using their own datasets.
It's not possible to train their service with your data, unlike wit.ai for example. Seems obvious to me that people would want to train with their own data.
Pretty much all the services that we are releasing will have some adaptation capabilities - allowing you to provide your own data, create your own models, etc - at some point. Stay posted.
I decided to test it a little. I copied phonem challenges and non-sensical phrasing from the web. Then I added some stuff that I know has problems from past experience.
-----
Let's explore some complicated conversions, shall we?
The old corn cost the blood.
The wrong shot led the farm.
The short arm sent the cow.
How can I intimate this to my most intimate friend?
Don't desert me here in the desert!.
They were too close to the door to close it.
The buck does funny things when does are present.
Today is 1/1/2015.
Today is Jan 5th, 1992.
It's currently half past 12. Or 12:30PM.
Twenty thousand dollars.
20,000 dollars.
20 thousand dollars.
2^5 = 32.
NASA is an acronym.
This ... is a pause.
EmailAddress@somedomain.com.
1. No "special characters" allowed in passwords when creating an account.
2. ...where's the REST API? I've "added a service" (TTS), but I have to write a webapp to expose it over HTTP? It sure is a different experience than your typical API documentation.
1. This is good feedback, thanks. 2. The rest API docs are at https://www.ibm.com/smarterplanet/us/en/ibmwatson/developerc... You can call the service directly, though the samples show using an http webapp as a proxy to avoid exposing private service credentials. We're still working on the documentation, so feedback is helpful here. What other service REST API docs do you like, just out of curiosity? What are the features that makes that documentation useful?
So the gist of what I'm seeing in this thread is, "Watson's API services aren't very good yet, but they will get better as it collects and processes more data".
So basically, IBM is charging us to provide it with training data to make Watson useful for practical applications. Makes sense, but I can't help but feel that it would be a smarter move to skip charging entirely for now, or to use drastically reduced pricing tiers that exist only for the purpose of preventing abuse. The idea of releasing a product like this with less than impressive demos is a bit of a risk. It's not going to encourage people to use it if the demos aren't compelling, and the demos won't be compelling until a lot of people are using it. I'd err on the side of optimism here, it'll probably work out for the best, but it will be interesting to see how this goes and provide a good case study.
My other thought is that if IBM can't get sufficient training data on their own, what hope do the rest of us have? Performing classification on arbitrary data is a herculean task. People could throw literally anything at this api and will expect to get common sense results, it's nearly impossible and pushing the boundaries of what even cutting edge software can do. But if a company like IBM spends billions of dollars and their demos still end up generating mostly confusion and complaints... This kind of open ended "AI" might be more difficult than even the most conservative experts thought.
EDIT: As an after thought, the real value here isn't so much software as it is pooled training data. Facebook has been able to identify human faces in photos for years, speech-to-text and concept modelling have all been around for a long time. What's difficult is getting the labelled data necessary to distinguish between "is this a picture of a person or a picture of a cat?". Watson is great and it seems like IBM has made an investment in acquiring and collecting the data necessary to do that. But their big play here might be to build a consumer friendly enough product that their users contribute the rest of that data for them over the next several years, building an aggregate data set that is worth as much or more than the software itself. Again, will be interesting to see how it plays out.
All of the Watson services are free in beta. (Bluemix, through which the services are accessed, requires a credit card after 30 days, but doesn't charge you for use of the beta Watson services.)
We wanted to get the services into peoples hands early, even though we're still working on them, rather than wait until we had a perfect product. There's a tradeoff here, but we figure that we can improve the services faster and better with public usage and feedback than we could in private isolation.
Since they're free, hopefully people will be able to have some fun playing around with the services, also!
> What's difficult is getting the labelled data necessary to distinguish between "is this a picture of a person or a picture of a cat?". Watson is great and it seems like IBM has made an investment in acquiring and collecting the data necessary to do that.
Are they using more than ImageNet? The ImageNet dataset(s) are not hard to get.
hey, try changing the classifier from "All" to "Scene". It does much better.. and stay tuned we will release some more api's on top of visual recognition to allow for image labeling..
This is great! There was a startup JetPacCity (acquired by Google) that was doing some CNN for image recognition, mostly on the mobile client side. They had open sourced their lib: https://github.com/jetpacapp/DeepBeliefSDK
Interesting! We over at Prismatic released our interest tagging API just yesterday ( http://blog.getprismatic.com/interest-graph-api/ ). Seems like there's a lot of opening up APIs going around.
I've been developing a product with Watson from within the Partner Ecosystem, some of those capabilities are pretty useful.
Others, sometimes, are kind of confusing, creating a broad overpopulated constellation of Watson-based APIs inside Bluemix.
Watson services on Bluemix are currently in beta. You can use the beta services at no charge, even after your 30 day Bluemix trial, although you will need to provide a credit card to Bluemix. You will not incur any charges unless you use any of the production services.
pesenti|11 years ago
All the Watson services are still in beta but will start going GA very soon (first one next month). If you have any questions, please fire up, the Watson team is ready to answer.
pgeorgi|11 years ago
> If you have any questions, please fire up, the Watson team is ready to answer.
So that's what you built Watson for :-)
Caligula|11 years ago
I just want to access some services via API from my own servers. I think the documentation is not that good, there should be curl examples at least. For instance, for the STT or TTS include some curl examples.
Does the STT have speaker identification or does it output text in one stream?
I tried to access: https://gateway-s.watsonplatform.net:8443/speech-to-text-bet...
I used my bluemix l/p. It did not work. Are there other api credentials that are needed?
jcfrei|11 years ago
frik|11 years ago
Watson Jeopardy itself is built on top of Apache open source stack (Apache UIMA and Hadoop): http://en.wikipedia.org/wiki/UIMA
ubercore|11 years ago
kastnerkyle|11 years ago
devniel|11 years ago
Q6T46nT668w6i3m|11 years ago
qeorge|11 years ago
For example, I searched Google for "photo of girl", and found this image which seems very easy:
http://www.wagggsworld.org/shared/uploads/img/rachel-s-p-pho...
Watson says:
Huh?This isn't me cherry picking bad results; aside from their demos I'm not finding any photos that are accurately classified. I even tried a headshot of a person isolated on a white background, and Watson told me I uploaded a photo of "shoes".
Seriously - how is this data useful? What could I build with this level of accuracy?
Watson team - do you agree? Is this product about to get a lot better, soon, or is this considered "pretty good"?
[1] http://visual-recognition-demo.mybluemix.net/
pesenti|11 years ago
We are also believe that the first applications (e.g., classifying animals or plants or landmarks in dedicated apps) will have narrower use case that give better accuracy.
bane|11 years ago
Watson gives high confidence to it being a color photo of a human (which is a Person, and an Animal). Which is right. But the only part that a human would ever really care about is that there's another human in the picture.
It gets things wrong with a reasonable confidence for Dog, Placental_Mammal and Long_Jump...importantly, these are wrong in ways that humans would never get wrong.
Just as important are the omissions. A human would probably describe this as a picture of a girl or young woman, laughing or smiling, with curly brown hair wearing a scarf -- and maybe some other incidental information.
Of that description, Watson only got the superclass of one part correct (Human, Person) and didn't provide any of the other parts.
AI fundamentally "thinks" differently than a human, and that makes it hard for humans to use AI as a cognitive enhancement tool in the same way humans use calculators, books, writing, etc. We don't trust what an AI is doing or the answers it provides because for the information it provides, AIs tend to provide right-and-irrelevant, weirdly wrong, or omits obvious and necessary information that a human might use for informational purposes.
If humans ever encounter aliens, it's likely that their mode of thinking will be just as different. So bridging that gap, and figuring out how to make AI like this useful could be a useful endeavor.
Patrick_Devine|11 years ago
Photo 75% Shoes 69% Nature_Scene 69% Meat_Eater 63% Object 63% Mammal 63% Vertebrate 63% Cat 63% Indoors 62% Room 60% Person 58% Color 57% Judo 54% Person_View 53% Human 51% Leisure_Activity 50%
If you give the classifier a hint (animal) it gives: Meat_Eater 63% Mammal 63% Vertebrate 63% Cat 63%
So, clearly needs work as a general classifier, but still potentially useful.
m_ke|11 years ago
portrait
youth
fashion
facial expression
women
european
girl
model
female
actress
SlipperySlope|11 years ago
Watson http://text-to-speech-demo.mybluemix.net/
Nuance http://www.nuance.com/for-business/text-to-speech/vocalizer/...
I prefer the Watson version voicing a sample paragraph. Both are good enough for an application that selects on price. For a voice-first application, maybe Watson is better for TTS.
For speech to text, Nuance has been the leader, e.g. Apple's Siri. Has anyone compared IBM speech recognition to Nuance, Microsoft & Google?
picheny|11 years ago
yourapostasy|11 years ago
Vocalware https://www.vocalware.com/index/demo CereProc https://www.cereproc.com/
It is getting increasingly difficult to pick one as the clear leader for "natural sounding". The results are good enough for voicing canned text, and certainly better enunciated than many thick-accented English speakers. Improvements through training can still be made in parsing the text.
For example, IBM Watson interprets "IT" as "it", in the following sentence.
Thank you for calling the IT department.
Vocalware and CereProc correctly parse that.
Who I would really like to hear opinions from are professional voice actors, though they would tend to be understandably leery to lend a hand to improve TTS. Is there a standardized form of writing text that communicates the kind of emphasis, placement of silence and warping of phonemes these actors use in their delivery to concisely convey emotion, that TTS products can adopt?
jp8000|11 years ago
AustinG08|11 years ago
cypher543|11 years ago
Kronopath|11 years ago
Edit: And to add insult to injury, the English voices do pronounce "Español" correctly!
vaibhava72|11 years ago
bkeroack|11 years ago
When this was first announced I remember reading about their pricing model where they would take a percentage of app revenue. I'm glad to see they offer flat pay-as-you-go pricing now. Some of the Watson services are intriguing.
jsstylos|11 years ago
bhuga|11 years ago
For visual recognition, I used a picture of a snowmobile from http://www.1888goodwin.com/2013/11/14/what-do-you-need-to-do..., which it identified with 73% confidence as "Invertebrate".
Speech to text is a parody twitter account waiting to happen. Here's me asking it how it does with technical transcription:
How do you doing technical words.
If you were going to have to talk about get an jute cushion pull.
And you wanted to discuss the impact on a file server memory.
Issues that cross processes talk about home forks rivers slowed difficult.
cma|11 years ago
jp8000|11 years ago
humanfromearth|11 years ago
It's not possible to train their service with your data, unlike wit.ai for example. Seems obvious to me that people would want to train with their own data.
pesenti|11 years ago
FrankenPC|11 years ago
I decided to test it a little. I copied phonem challenges and non-sensical phrasing from the web. Then I added some stuff that I know has problems from past experience.
----- Let's explore some complicated conversions, shall we? The old corn cost the blood. The wrong shot led the farm. The short arm sent the cow. How can I intimate this to my most intimate friend? Don't desert me here in the desert!. They were too close to the door to close it. The buck does funny things when does are present. Today is 1/1/2015. Today is Jan 5th, 1992. It's currently half past 12. Or 12:30PM. Twenty thousand dollars. 20,000 dollars. 20 thousand dollars. 2^5 = 32. NASA is an acronym. This ... is a pause. EmailAddress@somedomain.com.
Poiesis|11 years ago
1. No "special characters" allowed in passwords when creating an account. 2. ...where's the REST API? I've "added a service" (TTS), but I have to write a webapp to expose it over HTTP? It sure is a different experience than your typical API documentation.
jsstylos|11 years ago
aroopPandya|11 years ago
karmacondon|11 years ago
So basically, IBM is charging us to provide it with training data to make Watson useful for practical applications. Makes sense, but I can't help but feel that it would be a smarter move to skip charging entirely for now, or to use drastically reduced pricing tiers that exist only for the purpose of preventing abuse. The idea of releasing a product like this with less than impressive demos is a bit of a risk. It's not going to encourage people to use it if the demos aren't compelling, and the demos won't be compelling until a lot of people are using it. I'd err on the side of optimism here, it'll probably work out for the best, but it will be interesting to see how this goes and provide a good case study.
My other thought is that if IBM can't get sufficient training data on their own, what hope do the rest of us have? Performing classification on arbitrary data is a herculean task. People could throw literally anything at this api and will expect to get common sense results, it's nearly impossible and pushing the boundaries of what even cutting edge software can do. But if a company like IBM spends billions of dollars and their demos still end up generating mostly confusion and complaints... This kind of open ended "AI" might be more difficult than even the most conservative experts thought.
EDIT: As an after thought, the real value here isn't so much software as it is pooled training data. Facebook has been able to identify human faces in photos for years, speech-to-text and concept modelling have all been around for a long time. What's difficult is getting the labelled data necessary to distinguish between "is this a picture of a person or a picture of a cat?". Watson is great and it seems like IBM has made an investment in acquiring and collecting the data necessary to do that. But their big play here might be to build a consumer friendly enough product that their users contribute the rest of that data for them over the next several years, building an aggregate data set that is worth as much or more than the software itself. Again, will be interesting to see how it plays out.
jsstylos|11 years ago
We wanted to get the services into peoples hands early, even though we're still working on them, rather than wait until we had a perfect product. There's a tradeoff here, but we figure that we can improve the services faster and better with public usage and feedback than we could in private isolation.
Since they're free, hopefully people will be able to have some fun playing around with the services, also!
taliesinb|11 years ago
Are they using more than ImageNet? The ImageNet dataset(s) are not hard to get.
flamedoge|11 years ago
walterbell|11 years ago
corin_|11 years ago
johnward|11 years ago
cabirum|11 years ago
http://i.imgur.com/V59IeQH.png
aroopPandya|11 years ago
harisamin|11 years ago
davegolland|11 years ago
enricobruschini|11 years ago
ConfuciusSay|11 years ago
jcoffland|11 years ago
z3phyr|11 years ago
>>Speech to Text : This application only works in recent versions of Chrome supporting HTML5 audio capture
picheny|11 years ago
anonbanker|11 years ago
taf2|11 years ago
tparikh|11 years ago
X-combinator|11 years ago
niels_olson|11 years ago
johnward|11 years ago