I followed the link from the blog post that said "check out the demo on our product website". Then there's a big button that says "TRY IT FREE". Good, I say. That leads me through a signup process that involves credit cards and whatnot, and then dumps me out on what I guess is the equivalent of the AWS console, not some nice audio test page.
So then I root around in the console, finally find the text to speech stuff, and screw around with various interfaces. None of them seems to be the right thing. Eventually I decide I must have missed something, go back to the product website, and scroll down further to find the "convert your speech to text right now". Great, say I.
The blog post explicitly talks about video. I want to see if it can transcribe a talk I did, so I tried uploading a file; nothing appears to happen on Firefox. I try a couple more times. I sigh heavily and switch to Chrome.
It does appear to work on Chrome, but it's entirely infuriating. I tried uploading a video file, which was over 50MB, so it refused. I then figured out how to extract the audio alone and uploaded that, at which point it complained it was over a minute. Then I find another incantation to chop my audio to a minute (which they just should have done for me, and which anyway should be explained in the interface).
Finally, I upload 60 seconds of audio. And nothing fucking happens. After all that, the thing just doesn't doesn't work. No error messages, no anything.
This is my first impression of the Google Cloud Platform, and all I hear is the squeaking of clown shoes. I'm sure the rest of it can't be this bad, but if they can't make a simple demo work, I'm unlikely to find out.
How long before we can get a kodi plugin that transcribes the text and translates to the subtitle language you have chosen. I would really be interested in this for Japanese, Korean and Chinese shows that I have to wait sometime months or years before fansubs are available. Though because of Netflix english subs are being available a lot quicker than previously for many of these shows.
.. there's nothing stopping you from writing it, it's just a few calls to their Google cloud api. it's not an afternoon's work in the scripting language of your choice. the real issue is that I suspect translation might be a bit off at times
I wonder how much it will take until countries will require Telecom companies to transcribe and store all the phone calls for a "limited time period" of, let's say, 6 months, for "our security".
And then run algorithms on these texts to classify the conversations into "potentially crime related discussions" classes.
If they force the carriers to do it, then they likely have to deal with subpoenas or other documentation. If the government (of any country) does it themselves, that's way easier - https://infogalactic.com/info/ECHELON
Met Dan at an AI conference & having worked with the API, I think it's really cool that your average dev has access to this level of Transcription that's a non-trivial problem (been working on Speech Recognition since early '00s).
I agree with some of the comments regarding Google being a big co & having big co issues. But at the core of it, the team, the offering & attention to what matters is solid.
It's certainly going to open up a whole new realm of possibilities.
the number of voice based startups that have built business logic on top of this fundamental api is staggering. some names: voicera (automated meeting minutes), voiceops (call center call analysis), chorus.ai (phone call analytics)
the focus on improving call center performance is where the money is. plenty more vendors will enter this market.
The Kaldi toolkit is state of the art, but you have to know quite a bit about speech and natural language processing to create a comparable service that works well (or invest the time to learn it). Definitely not plug and play, though.
I use this on a daily basis and it's pretty good but can't cope with fast speech or poor sound quality. For the price I don't expect more but it's not amazing.
The tracking part will be one step away - i.e, it'll "read the transcript" including all the missed subtle inflections and other subtleties; contextual clues from pitch or tone e.g. sarcasm, which will lead to some hilarious mis-targeting, until LEO (or other Authorities) use a similar system and something dreadful happens.
[+] [-] andrewstuart|8 years ago|reply
[+] [-] wpietri|8 years ago|reply
I followed the link from the blog post that said "check out the demo on our product website". Then there's a big button that says "TRY IT FREE". Good, I say. That leads me through a signup process that involves credit cards and whatnot, and then dumps me out on what I guess is the equivalent of the AWS console, not some nice audio test page.
So then I root around in the console, finally find the text to speech stuff, and screw around with various interfaces. None of them seems to be the right thing. Eventually I decide I must have missed something, go back to the product website, and scroll down further to find the "convert your speech to text right now". Great, say I.
The blog post explicitly talks about video. I want to see if it can transcribe a talk I did, so I tried uploading a file; nothing appears to happen on Firefox. I try a couple more times. I sigh heavily and switch to Chrome.
It does appear to work on Chrome, but it's entirely infuriating. I tried uploading a video file, which was over 50MB, so it refused. I then figured out how to extract the audio alone and uploaded that, at which point it complained it was over a minute. Then I find another incantation to chop my audio to a minute (which they just should have done for me, and which anyway should be explained in the interface).
Finally, I upload 60 seconds of audio. And nothing fucking happens. After all that, the thing just doesn't doesn't work. No error messages, no anything.
This is my first impression of the Google Cloud Platform, and all I hear is the squeaking of clown shoes. I'm sure the rest of it can't be this bad, but if they can't make a simple demo work, I'm unlikely to find out.
[+] [-] xbmcuser|8 years ago|reply
[+] [-] make3|8 years ago|reply
[+] [-] tudorconstantin|8 years ago|reply
And then run algorithms on these texts to classify the conversations into "potentially crime related discussions" classes.
[+] [-] ClassyJacket|8 years ago|reply
[+] [-] supertrope|8 years ago|reply
https://www.theguardian.com/commentisfree/2013/may/04/teleph...
[+] [-] caseysoftware|8 years ago|reply
[+] [-] aviv|8 years ago|reply
[+] [-] diminish|8 years ago|reply
[+] [-] UperSpaceGuru|8 years ago|reply
I agree with some of the comments regarding Google being a big co & having big co issues. But at the core of it, the team, the offering & attention to what matters is solid.
It's certainly going to open up a whole new realm of possibilities.
[+] [-] gok|8 years ago|reply
Interesting name change. It’s certainly more precise, but was “Speech API” really confusing people?
[+] [-] rahimnathwani|8 years ago|reply
[+] [-] monkeydust|8 years ago|reply
[+] [-] 6841iam|8 years ago|reply
the focus on improving call center performance is where the money is. plenty more vendors will enter this market.
[+] [-] adorable|8 years ago|reply
[+] [-] woodson|8 years ago|reply
Then there are implementations of Baidu’s DeepSpeech (PaddlePaddle: https://github.com/PaddlePaddle/DeepSpeech, or Mozilla’s version).
[+] [-] walterbell|8 years ago|reply
[+] [-] flarg|8 years ago|reply
[+] [-] infocollector|8 years ago|reply
[+] [-] gaius|8 years ago|reply
[+] [-] Animats|8 years ago|reply
[+] [-] lozf|8 years ago|reply