top | item 41347637

(no title)

shade | 1 year ago

Speaking as someone who's deaf and uses these services a lot: for speech to text, the AI stuff is getting rather good.

I'm not saying it's perfect for every situation, but I have a very high success rate using InnoCaption[0] for captioned phone calls, including to places like restaurants with a lot of noise going on in the background. InnoCaption does both live person and AI-based captioning; since they started offering the AI-based option I've left that on, and I've never had to switch to human operators to continue a conversation.

That said - I'm not deaf from birth (lost my hearing in elementary school), so I voice for myself and that does simplify the process. I have used the old school text-only relay services and that was always such a miserable experience for me that I would crawl over broken glass to avoid making phone calls, especially going through phone trees. That's one area that relay operators still have a major advantage on. IIRC, Google's Pixel phones are supposed to be able to navigate phone trees for you, but since I use iOS I have no personal experience there.

[0] https://www.innocaption.com/

discuss

order

retrac|1 year ago

I can't really understand speech these days without the captions to go with it. But I encounter discrepancies with AI generated captions very often. As in, I heard something and from context I know I'm right and the AI is wrong. With Whisper and other deep learning based speech systems in particular - they can generate very plausible misinterpretations - sounds similar and is grammatically plausible - but not what was said. Of a kind that a person with semantic understanding of what's going on would not make. So I am a little leery of them for that reason. I rely on it every day for generating captioning to video and so on. I don't find any iteration I've tried reliable or comfortable for interactive use.

JohnFen|1 year ago

> I encounter discrepancies with AI generated captions very often. As in, I heard something and from context I know I'm right and the AI is wrong.

I've been noticing this as well. It's becoming a common problem. Also, many times I've noticed that if I hadn't heard the speech being captioned and only had the captioning to go by, I would have had little chance of correctly understanding what was actually said.

rohansingh|1 year ago

The phone tree stuff on Pixel is decent but nowhere near 100% reliable or robust.

If it hears and understands an automated system speaking out a phone tree, it will start to list the options and you can tap on them. Usually works but often doesn't recognize that a phone tree is happening. Other times it recognizes the phone tree, but mistranscribes the options.

As a non-deaf person, it's a handy UX improvement. But I wouldn't recommend that anyone rely on it.

ensignavenger|1 year ago

These services are indeed great for those that need them. I received one or two years ago when I worked at a computer shop. Unfortunately they were always scammers, abusing the system.