Amazon Transcribe Streaming Now Supports WebSockets

[+] iandanforth|6 years ago|reply

Anyone have experience with the accuracy of Amazon Transcribe vs the Google offerings? (Google Live Transcribe for Android currently tops my list of impressive transcription offerings.)

[+] rococode|6 years ago|reply

I built a project where we transcribed speech into chat bubbles for an AR app (Magic Leap). We started with Google Speech-to-Text and swapped to Amazon Transcribe.

Performance-wise they seemed mostly similar - both can pick up text fairly accurately in somewhat noisy environments. We did notice that Google's offering seemed to be slightly more accurate. But the difference was marginal and only noticeable when we compared side-by-side on intentionally poorly pronounced words. There were two big differences that made it impossible for us to use Google's product:

1. Google has a 1 minute limit on streaming speech-to-text. It just closes the connection at 1 minute. It doesn't even send you a "final result", so if speech was being recorded at the time the connection dropped, that transcription is lost. Speaking of which...

2. Google doesn't provide incremental updates. So if someone speaks for a while, you only get an update at the end of it.

Note that this is their API - my impression is the product they use in their apps is superior in functionality to the product available on Google Cloud.

Amazon Transcribe, on the other hand, has a 4 hour limit and sends incremental updates, so in a longer sentence, like this one I'm currently writing, you would get a message every couple words, which is essential when the goal is to show a live transcription.

[+] Lindenmayer|6 years ago|reply

I really can recommend Otter.ai for English transcription. And I'm pretty happy with the accuracy. You can tag and listen to each part of the transcribed text. I use it for consuming lengthy YouTube conference talks. They also have a very generous free plan of 10 hours per month. Big fan here :-)

[+] superasn|6 years ago|reply

Yes we have implemented both plus Watson and at least IMO Amazon's transcription was the best, followed by watson and then Google. Which has honestly a big surprise to me since I was expecting something more like Google, Amazon, big gap and then Watson.

Also one thing I liked about it IIRC is that Amazon was the only service at the time that offered punctuations (to mark end of sentence) in English which is very useful in some cases.

[+] thelazydogsback|6 years ago|reply

Any reason Microsoft Cognitive Services is missing from the discussion?

Can test Speech reco API in the page here: https://azure.microsoft.com/en-us/services/cognitive-service...

[+] unknown|6 years ago|reply

[deleted]

[+] dankohn1|6 years ago|reply

I help organize KubeCon + CloudNativeCon, and we are planning to add live transcription (also known as open captioning) to future events. We also want to offer simultaneous translation to and from Chinese for our Shanghai event. I'd love to see a comparison of the major offerings if anyone has done it. If not, I guess we'll need to.

[+] zachruss92|6 years ago|reply

Disclaimer: I am an organizer of a GDG, but these opinions are my own.

Google has a cloud text to speech API that supports streaming audio. Google blows Amazon out of the water here with accuracy, speed, and features at the same price or cheaper. They also have translation APIs.

I'm more than happy to help out if needed!

https://cloud.google.com/speech-to-text/

[+] pouta|6 years ago|reply

I can help you guys with that. I will send you an email

[+] dmix|6 years ago|reply

This sounds great. I hope to see this being adopted by blogs and news sites!

The browser highlight->speech plugins have always been a bit iffy.

[+] emilfihlman|6 years ago|reply

Listening to that voiced transcript, those "humanlike" sounds of breathing and so on are actually very, very annoying and bring nothing of value to the table. They are actually taking away from the experience, a lot. They are distracting and not natural.

[+] amelius|6 years ago|reply

> I love services like Amazon Transcribe. They are the kind of just-futuristic-enough technology that excites my imagination the same way that magic does. It’s incredible that we have accurate, automatic speech recognition for a variety of languages and accents, in real-time.

I personally hate it when I have to use a service for something that could be done locally on my computer or smartphone. And I don't get that fuzzy magical feeling, but instead I think of a (very nearby) dystopian future where a single company knows what all citizens say or do in real time.

Needless to say, I didn't read the rest of the article.

[+] pjmlp|6 years ago|reply

> It’s incredible that we have accurate, automatic speech recognition for a variety of languages and accents, in real-time.

Then a few paragraphs later:

> For real-time transcription, Amazon Transcribe currently supports British English (en-GB), US English (en-US), French (fr-FR), Canadian French (fr-CA), and US Spanish (es-US).

So basically it boils down to two English variants, two French variants and US Spanish variant.

And then one wonders why such projects never pick up steam around the world.

[+] philliphaydon|6 years ago|reply

So you’re saying that amazon should release this with over 100 languages support on day 1?

23 comments