Live-caption glasses let deaf people read conversations [video]

[+] quantumquetzal|3 years ago|reply

Background: researched this space for a graduate degree.

There are a few issues that are unanswered by this video (which isn't intended to be a technical deep dive, but I don't see any related links in the video description):

1. How do these glasses handle multiple simultaneous speakers? Based on the display I saw, it shows the speakers' words sequentially, which starts to fall apart in real-world environments, especially group conversations. This is a big problem, and wider adoption is contingent on handling this elegantly.

2. These appear to be the classic "smart glasses" display style that's pervasive in consumer head-worn displays today, where content is projected at a fixed depth in front of the wearer. Because the captions aren't anchored at the same focal distance as the speaker, the wearer's eyes will swap between the captions and the speaker's faces, which is a tiring activity, and can make the wearer feel like they're not part of the conversation or being rude.

3. As mentioned by another commenter, this is a useful idea for people who lose their hearing later in life. That said, this is less (although certainly still) useful for people who have congenital hearing loss and primarily communicate via ASL.

All in all, it's exciting to see growing interest in this space, as it's easily extendable to people learning a new language or navigating a foreign country. I think offloading the speech-to-text to a tethered mobile device is a good choice (though it would be nice to do low-latency wireless transmission).

[+] DrKeithDuggar|3 years ago|reply

Thank you for looking at XRAI Glass!

1. For multiple simultaneous speakers of comparable volume, it’s only as good as the underlying speech-to-text engines we’ve implemented/integrated, which is currently not very good. It’s active area of research and engineering for us and we believe we’ll make strides to improve things; but, as you rightly point out, solving the crosstalk problem is very difficult. For the more general so-called "cocktail party problem", we can do a good job of filtering out more distant/lower volume voices and other environmental noise. Choosing the right microphone can improve things further, for example by pairing a noise canceling Bluetooth lapel mic.

2. We allow one to project the subtitles at varying depth, within the capabilities of glasses. We're seeing an effective focal depth range for fixed apparent size of about 0.5m to 3m. If one also allows change in apparent size, to simulate perspective scaling, the range is higher.

[+] aeturnum|3 years ago|reply

This is a classic curbcut in the sense that it will help those with heading as much (if not more than) the D/deaf community. Still very excited for it - agree with all your questions and concerns.

As usual, this marketing seems most directed at normate ideas about what disabled people want / need, but the tech seems very cool and there does seem to be potential. Without looking into the product deeply it seems like there are D/deaf people on the team, which gives me hope.

I do wish that we would just embrace the idea that using machines to make information available in many mediums is something all people can use and appreciate.

[+] p1necone|3 years ago|reply

Is multiple simultaneous speakers at the same volume/distance actually an important problem to solve? I already can't have a conversation if that's happening and my hearing is fine.

[+] nostrebored|3 years ago|reply

Is low latency wireless transmission feasible now?

The last time I worked on anything here, there were a number of problems with transmitting highly compressed, low resolution video data. The consumer devices could not handle just sending the packets. In my project we were annotating real time video and only sending back the annotations, but even that would cause devices to overheat and the applications to fail in really interesting ways.

[+] gumby|3 years ago|reply

I have a different use case, involving wearing them in the house: listen to what my girlfriend says, use some ML to analyze whether I need to know it and if so, put it on the display.

She has a different "speech mode" than I: she speaks while I'm reading or washing the dishes or whatever, and sometimes it's to herself, sometimes to Siri, and sometimes it's something she wants me to know.

[+] flangola7|3 years ago|reply

The Android voice recording app can already write a transcript in real time that indicates different speakers.

[+] ape4|3 years ago|reply

It would be nice if it put the captions over the speaker (in a speech bubble?)

[+] justinator|3 years ago|reply

>3. As mentioned by another commenter, this is a useful idea for people who lose their hearing later in life. That said, this is less (although certainly still) useful for people who have congenital hearing loss and primarily communicate via ASL.>

Someone primarily communicates with ASL and then there's me that doesn't know ASL. I can speak to them, and they can read what I've spoken. That works pretty well. They communicate with me via text to speech, or (I guess in the near future) ASL to speech - however that will work.

I mean, that's awesome.

[+] elil17|3 years ago|reply

I'm a hearing person and I've spent a summer interning in a 50/50 mixed Deaf and hearing research group.

My take is that this is a huge UI improvement for AI speech to text, which a lot of Deaf people are already using to listen to conversations. It seems particularity great because it allows this technology to provide situational awareness while, for example, walking.

It's important to remember, though that, for conversations where you're trying to include a Deaf person who isn't good at speaking or chooses not to speak, speech to text is a fundamentally unequal communication modality. They will be able to "receive" but they won't be able to "transmit", which makes for extremely lopsided conversations. There is no substitute for taking the time to learn a sign language or to have conversations via writing (no substitute for sign language as it requires a lot of patience from both parties).

[+] hinkley|3 years ago|reply

We get this wrong even for hearing people in a lot of situations. Conference calls in particular, the people in the room have a different experience from those on the call. Side conversations in a single room can be disruptive, but over a phone the secondary conversation and the primary can turn into an unintelligible mash.

Hard of hearing people have the same problem in person. We aren't really at a place yet where someone wearing hearing aids still has 3d hearing. So like on a conference call, they can't figure out what's going on when four people are talking at once.

I know a partially deaf kid who prefers socializing on Discord, because of this. Everyone is equal and all conversations have to be 2 people at a time.

[+] jackblemming|3 years ago|reply

Why can’t they also make glasses that translate sign language into English audio or text then?

[+] danscarfe|3 years ago|reply

Thanks for the shout-out! To try and answer your questions:

1) The cocktail party problem is still a WIP. This is a very hard problem to solve.

2) These are not 'viewer' glasses they are 3DOF glasses which support moving and pinning the subtitles in 3D space

3) Whilst we targeted the Deaf and HOF to begin, we see broad applicability beyond this

4) You don't need glasses to test it, just an Android 12+ phone. Download and try it. We'd love your feedback https://link.xrai.glass/app

[+] ogig|3 years ago|reply

Hi there: Some feedback from a sing language interpreter, my wife, as I showed her this.

> 4) You don't need glasses to test it, just an Android 12+ phone.

This exactly pointed her. Some deaf people will use any dictation software on the phone, and look at the phone when needed. This glasses instead will cover some field of view. Note that view for deaf people is more important and used than for the rest of us. She couldn't see the improvement of using bulky glasses instead of lowering your eyes to the phone.

Personally I think the endeavor is admirable and wish you best of luck. Also, as other comments say, I think this product might be more desirable for HOF and late in life hearing loss sectors than born deaf people.

[+] autoexec|3 years ago|reply

What are the privacy/security issues with this? Does this mean every conversation a person wearing these has (or that occurs within earshot) is being collected and harvested by someone? Will the AR be used to insert ads into people's conversations or plaster images of ads all over the place? Will certain words or phrases be automatically censored?

This is cool tech, that could be used to help people, but it comes with lots of potential for new forms of evil that were not possible without it. Considering that I can't remember the last time I bought a product using a new technology that wasn't also designed to work against my interests, I'm immediately skeptical of any device that can't be used offline and especially one that requires being connected to cell phone apps.

[+] danscarfe|3 years ago|reply

We've taken a privacy-first approach here. All data is only ever stored on the device, owned by the user, inaccessible to us. It's only ever transcribing when a user asks it to and only stored if the user asks it to be. We don't censor anything. We are soon to release purely on-device transcription, but the quality of this is still not as good as the cloud providers offer. The app itself is what powers the glasses, they are just output devices.

[+] Sunspark|3 years ago|reply

It's not that easy to pick up every speech utterance in a wide range and separate it by speaker. The further away the sound is, the less intelligible it is. An artificial non-directional microphone is unlikely to pick up with the same clarity or distance as with your own ears. There should not be any privacy concern with having a microphone ear vs a biological ear. If there is a concern the best way to manage it is to not talk about confidential topics with other people in the room.

There won't be any ads, because people would just use their phones instead which don't have ads. Specifically Google Live Transcribe, Otter and the like. Those require a data connection to the network, but there are versions that don't need the network at all. E.g. Chrome's Live Caption option. Eventually as technology becomes more power efficient and miniaturized it won't need to be paired to a phone.

The advantage of glasses is that people find it very distracting seeing a phone scrolling away, my GP stares at the phone instead of me because he is fascinated by it. Sometimes you can't be holding a phone up if something is being worked on. The glasses would also allow for a bit more directionality. It's a promising tool depending on how well it is implemented.

[+] primax|3 years ago|reply

I don't see how this is inherently worse than mobile phones, which are carried by everyone in the developed world all the time.

[+] lazyeye|3 years ago|reply

Completely agree. There seems to be such a gap in the market for privacy-first hardware but very few companies are doing anything.

[+] vlunkr|3 years ago|reply

> Will the AR be used to insert ads into people's conversations or plaster images of ads all over the place?

I doubt it. This isn't the kind of cheap mass-market device where running ads is going to make you a big profit.

[+] Rustwerks|3 years ago|reply

The implications are approximately the same as https://www.amazon.com/tape-recorder/s?k=tape+recorder except with compute power to make storage and search more convenient. Except that the glasses/app don't save anything by default nor do they need to use the internet.

I suppose it would be a nice feature if they saved all of your conversations for later? The translations are too imperfect for any legal matter use.

[+] WastingMyTime89|3 years ago|reply

Reading the comments here, I think people are missing how many people are losing hearing while aging and how alienating it is. Even if it only works with one speaker at a time, it could mean a massive quality of life improvement for a large and growing share of the population.

Sure it won’t solve the issues faced by the deaf community but that’s only a tiny portion of the people handicapped by difficulty hearing.

[+] TRiG_Ireland|3 years ago|reply

All these "aids" always have to be born by the Deaf people, not by the hearing. At least these have some chance of actually working, I suppose, unlike those ridiculous gloves one sometimes sees celebrated.

[+] sigwinch28|3 years ago|reply

I hope (reatively immature) solutions like this will not be used as an excuse to remove accessible infrastructure from the world (e.g. captions at the cinema, live subtitles at the theatre, text displays on public transport).

[+] drjasonharrison|3 years ago|reply

There is a huge population group that I am hoping will demand and make accessibility much more refined. Everything from the size of text to lighting levels to subtitles

[+] captain3d|3 years ago|reply

The number one thing these glasses/software need to solve is that the words match the speech for a one to one conversation in a quiet environment. Eg doctors visit. I think they are very close.

We just got the Nreal/Xrai setup a few days ago for deaf from birth wife (hearing husband) She grew up lipreading but integrated more with signing and deaf community as an adult. She has a cochlear implant but can not understand language from sound alone. And really doesn't enjoy hearing that much unless we are watching a movie etc where the sound is 100% linked to the visual.

Initial reaction to the setup is. 1. Impressed, hopeful, excited 2. A bit complicated technically. More stuff to deal with. not an everyday thing. 3. Phone battery usage high. Maybe 3 - 4 hours 4. In the right situation they will be really powerful. 5. Need more control over the interface eg show/hide 'listening' icon etc. Can be distracting. Move subtitle position (maybe you can) 6. Processing delay can make you more an observer of the conversation. Response time is delayed enough to interrupt the flow a conversation. (satellite tv connection interviews)

The number one barrier to using them is having everything ready for the moment they are needed. You need to plan ahead. Takes a few mins to set up.

All the other high end ideas can be set aside while the core function is dialed in.

We really appreciate the effort and hope to contribute.

[+] DrKeithDuggar|3 years ago|reply

Thank you for this feedback! Btw, we'll support better adjustment of the subtitle position very soon. We did just add many additional font size options as well. If you haven't already, please consider joining our Discord server to provide feedback at any time: https://discord.gg/7HjyDJ3JAz

[+] 1-6|3 years ago|reply

Can they make the glasses without tint?

I can imagine users going, "I'm deaf, not blind."

Magic Leap has screens that can adjust opacity at the pixel-level.

[+] danscarfe|3 years ago|reply

The current generation of glasses (such as these Nreal Airs) use a birdbath technology which requires a tint. The next generation of waveguide glasses won't require this but they are currently not as good visual quality, especially for reading text. They are also a lot more money. Everything is a trade off right now. The next couple of years will be transformational.

[+] baseline-shift|3 years ago|reply

I am pretty deaf yet somehow have wound up interviewing people for a living (go figure).

I currently use googlemeet and recently switched to Otter.ai for recording/transcription. Unlike the previous transcription tools for journalists that I used in the past - Otter.ai generates the text live on the laptop screen while we are talking and even corrects itself to make sense as the speaker reveals contextual clues.

It is a huge help, and I had wished there was something for real life conversations like this is for screen conversation.

Good news for deaf people. You only have to watch newscasters speaking through a pasted-on permanent grin - as if every word is ee - to know that lip reading is garbage.

[+] giraffe_lady|3 years ago|reply

Very cool, especially for people who lose their hearing later in life. For other deaf people it's important to remember that written english is not a form of their native sign language¹, so this would be like (because it is) reading captions in a second language. Still potentially useful but with more limitations. Not that there's necessarily a technological way around those limitations either.

¹ Afaik this applies to all other sign languages outside english too. Signed Exact English exists and probably other-language equivalents too but I've never met a native speaker.

[+] norgie|3 years ago|reply

It is true and something more people should understand that ASL is not a signed version English, but most ASL speakers are pretty much bilingual. They are taught to read English, and most places also encourage learning to speak English, generally with speech-language pathologists, though some in the community are understandably reluctant.

[+] generalizations|3 years ago|reply

The problem with ASL compared to lip reading is that it's a form of self-segregation, limiting the deaf person to primarily communicating only with other people who know ASL. If these glasses are effective, it could help bridge that gap.

[+] gedy|3 years ago|reply

> Not that there's necessarily a technological way around those limitations either.

I can see at some point here being able to wear AR glasses that overlay hand signing over the speaker

[+] spullara|3 years ago|reply

Folks that use ASL use fingerspelling which is of course just written english, no?

https://www.lifeprint.com/asl101/fingerspelling/fingerspelli...

[+] unknown|3 years ago|reply

[deleted]

[+] tossaway0|3 years ago|reply

I admittedly don’t have much experience with deaf people; I had one acquaintance in high school who was deaf. Hanging out with him made me very aware of how isolating it can be to only be able to participate in conversations where you are actively trying to pay attention.

If these can let people hang out and participate without having to actively track each speaker in a group setting it will go a long way.

[+] wbm1|3 years ago|reply

This has the potential to be the killer app for mixed reality once instant / realtime translation is possible. Imagine being able to understand every language in the world - and if two users of this product meet, being able to converse without learning each other's language.

It would be a technological Babelfish.

[+] eclipxe|3 years ago|reply

https://www.youtube.com/watch?v=SQd394a4qEo

[+] flanbiscuit|3 years ago|reply

Anyone know the status of these Google AR glasses?

The form factor of Google's AR glasses look much much closer to a normal pair of glasses than the glasses in the top video (which look like heavy sunglasses with a wire connecting to your phone)

You posted a CNET link, here's a direct link to Google's: https://www.youtube.com/watch?v=lj0bFX9HXeE

In the description it says: "This device has not been authorized as required by the rules of the Federal Communications Commission. This device is not, and may not be, offered for sale or lease, or sold or leased, until authorization is obtained."

So I guess they might be waiting for that?

[+] polygot|3 years ago|reply

It's powered by an app apparently. Could be interesting to connect it to GPT-4 (e.g., someone asks you a question and then you could just tell them the answer from GPT.)

Or, if it had OCR capabilities, you could just hold a sheet of paper in front of yourself and say "what's this?" and it would explain the text to you.

[+] ghotli|3 years ago|reply

So I dug around a little bit and figured I'd just ask.

As just some guy on the internet, can I buy one of these and write a hello world to have text show up in front of my eyes of my own choosing? Does it have an API or will it?

[+] danscarfe|3 years ago|reply

You can indeed. It's called the Nreal sdk

[+] Blackthorn|3 years ago|reply

This is the sort of thing I expected Google Glass to be able to do. But Google apparently lacks sufficient imagination, so they just unceremoniously canned it. Really hope this takes off.

[+] dymk|3 years ago|reply

Google literally announced this feature 10 months ago at I/O 2022

[+] zoklet-enjoyer|3 years ago|reply

I was so excited for the possibilities of AR when I first heard about Google Glass. I imagined navigating foreign cities with signs auto-translated to English, turn by turn directions, translated subtitles, etc

[+] allanrbo|3 years ago|reply

Bought a pair of these glasses, Nreal Air, a few months ago. I find them useful for laptop coding without straining my neck. It's awesome to see more creative use cases for them!

[+] ElijahLynn|3 years ago|reply

Video doesn't have a link to product. Here it is: https://xrai.glass/

176 comments