Anyone have experience with the accuracy of Amazon Transcribe vs the Google offerings? (Google Live Transcribe for Android currently tops my list of impressive transcription offerings.)
I built a project where we transcribed speech into chat bubbles for an AR app (Magic Leap). We started with Google Speech-to-Text and swapped to Amazon Transcribe.
Performance-wise they seemed mostly similar - both can pick up text fairly accurately in somewhat noisy environments. We did notice that Google's offering seemed to be slightly more accurate. But the difference was marginal and only noticeable when we compared side-by-side on intentionally poorly pronounced words. There were two big differences that made it impossible for us to use Google's product:
1. Google has a 1 minute limit on streaming speech-to-text. It just closes the connection at 1 minute. It doesn't even send you a "final result", so if speech was being recorded at the time the connection dropped, that transcription is lost. Speaking of which...
2. Google doesn't provide incremental updates. So if someone speaks for a while, you only get an update at the end of it.
Note that this is their API - my impression is the product they use in their apps is superior in functionality to the product available on Google Cloud.
Amazon Transcribe, on the other hand, has a 4 hour limit and sends incremental updates, so in a longer sentence, like this one I'm currently writing, you would get a message every couple words, which is essential when the goal is to show a live transcription.
I really can recommend Otter.ai for English transcription. And I'm pretty happy with the accuracy. You can tag and listen to each part of the transcribed text. I use it for consuming lengthy YouTube conference talks. They also have a very generous free plan of 10 hours per month. Big fan here :-)
Yes we have implemented both plus Watson and at least IMO Amazon's transcription was the best, followed by watson and then Google. Which has honestly a big surprise to me since I was expecting something more like Google, Amazon, big gap and then Watson.
Also one thing I liked about it IIRC is that Amazon was the only service at the time that offered punctuations (to mark end of sentence) in English which is very useful in some cases.
I help organize KubeCon + CloudNativeCon, and we are planning to add live transcription (also known as open captioning) to future events. We also want to offer simultaneous translation to and from Chinese for our Shanghai event. I'd love to see a comparison of the major offerings if anyone has done it. If not, I guess we'll need to.
Disclaimer: I am an organizer of a GDG, but these opinions are my own.
Google has a cloud text to speech API that supports streaming audio. Google blows Amazon out of the water here with accuracy, speed, and features at the same price or cheaper. They also have translation APIs.
Listening to that voiced transcript, those "humanlike" sounds of breathing and so on are actually very, very annoying and bring nothing of value to the table. They are actually taking away from the experience, a lot. They are distracting and not natural.
> I love services like Amazon Transcribe. They are the kind of just-futuristic-enough technology that excites my imagination the same way that magic does. It’s incredible that we have accurate, automatic speech recognition for a variety of languages and accents, in real-time.
I personally hate it when I have to use a service for something that could be done locally on my computer or smartphone. And I don't get that fuzzy magical feeling, but instead I think of a (very nearby) dystopian future where a single company knows what all citizens say or do in real time.
Needless to say, I didn't read the rest of the article.
> It’s incredible that we have accurate, automatic speech recognition for a variety of languages and accents, in real-time.
Then a few paragraphs later:
> For real-time transcription, Amazon Transcribe currently supports British English (en-GB), US English (en-US), French (fr-FR), Canadian French (fr-CA), and US Spanish (es-US).
So basically it boils down to two English variants, two French variants and US Spanish variant.
And then one wonders why such projects never pick up steam around the world.
[+] [-] iandanforth|6 years ago|reply
[+] [-] rococode|6 years ago|reply
Performance-wise they seemed mostly similar - both can pick up text fairly accurately in somewhat noisy environments. We did notice that Google's offering seemed to be slightly more accurate. But the difference was marginal and only noticeable when we compared side-by-side on intentionally poorly pronounced words. There were two big differences that made it impossible for us to use Google's product:
1. Google has a 1 minute limit on streaming speech-to-text. It just closes the connection at 1 minute. It doesn't even send you a "final result", so if speech was being recorded at the time the connection dropped, that transcription is lost. Speaking of which...
2. Google doesn't provide incremental updates. So if someone speaks for a while, you only get an update at the end of it.
Note that this is their API - my impression is the product they use in their apps is superior in functionality to the product available on Google Cloud.
Amazon Transcribe, on the other hand, has a 4 hour limit and sends incremental updates, so in a longer sentence, like this one I'm currently writing, you would get a message every couple words, which is essential when the goal is to show a live transcription.
[+] [-] Lindenmayer|6 years ago|reply
[+] [-] superasn|6 years ago|reply
Also one thing I liked about it IIRC is that Amazon was the only service at the time that offered punctuations (to mark end of sentence) in English which is very useful in some cases.
[+] [-] thelazydogsback|6 years ago|reply
Can test Speech reco API in the page here: https://azure.microsoft.com/en-us/services/cognitive-service...
[+] [-] unknown|6 years ago|reply
[deleted]
[+] [-] dankohn1|6 years ago|reply
[+] [-] zachruss92|6 years ago|reply
Google has a cloud text to speech API that supports streaming audio. Google blows Amazon out of the water here with accuracy, speed, and features at the same price or cheaper. They also have translation APIs.
I'm more than happy to help out if needed!
https://cloud.google.com/speech-to-text/
[+] [-] pouta|6 years ago|reply
[+] [-] dmix|6 years ago|reply
The browser highlight->speech plugins have always been a bit iffy.
[+] [-] emilfihlman|6 years ago|reply
[+] [-] amelius|6 years ago|reply
I personally hate it when I have to use a service for something that could be done locally on my computer or smartphone. And I don't get that fuzzy magical feeling, but instead I think of a (very nearby) dystopian future where a single company knows what all citizens say or do in real time.
Needless to say, I didn't read the rest of the article.
[+] [-] pjmlp|6 years ago|reply
Then a few paragraphs later:
> For real-time transcription, Amazon Transcribe currently supports British English (en-GB), US English (en-US), French (fr-FR), Canadian French (fr-CA), and US Spanish (es-US).
So basically it boils down to two English variants, two French variants and US Spanish variant.
And then one wonders why such projects never pick up steam around the world.
[+] [-] philliphaydon|6 years ago|reply