top | item 45977883

(no title)

chrischen | 3 months ago

Real time translations is a real good use case. The problem is most implementations such as the Airpods live translate are not great.

discuss

makeitdouble|3 months ago

I've been in many situations where I wanted translations, and I can't think of one where I'd actually want to rely on either glasses or the airpods working like they do in the demos.

The crux of it for me:

- if it's not a person it will be out of sync, you'll be stopping it every 10 sec to get the translation. One could as well use their phone, it would be the same, and there's a strong chance the media is already playing from there so having the translation embedded would be an option.

- with a person, the other person needs to understand when your translation in going on, and when it's over, so they know when to get an answer or know they can go on. Having a phone in plain sight is actually great for that.

- the other person has no way to check if your translation is completely out of whack. Most of the time they have some vague understanding, even if they can't really speak. Having the translation in the glasses removes any possible control.

There are a ton of smaller points, but all in all the barrier for a translation device to become magic and just work plugged in your ear or glasses is so high I don't expect anything beating a smartphone within my lifetime.

chrischen|3 months ago

Some of your points are already considered with current implementations. Airpods live translate uses your phone to display what you say to the target person, and the target person's speech is played to your airpods. I think the main issue is that there is a massive delay and apple's translation models are inferior to ChatGPT. The other thing is the airpods don't really add much. It works the same as if you had the translation app open and both people are talking to it.

Aircaps demos show it to be pretty fast and almost real time. Meta's live captioning works really fast and is supposed to be able to pick out who is talking in a noisy environment by having you look at the person.

I think most of your issues are just a matter of the models improving themselves and running faster. I've found translations tend to not be out of whack, but this is something that can't really be solved except by having better translation models. In the case of Airpods live translate the app will show both people's text.

jhugo|3 months ago

I have the G1 glasses and unfortunately the microphones are terrible, so the live translation feature barely works. Even if you sit in a quiet room and try to make conditions perfect, the accuracy of transcription is very low. If you try to use it out on the street it rarely gets even a single word correct.

chrischen|3 months ago

This is the sad reality of most if these AI products and it’s that they are just taking poor feature implementations on the hardware. It seems like if they just picked one or these features and doing it well will make the glasses useful.

Meta has a model just for isolating speech in noisy environments (the “live captioning feature”) and it seems that’s also the main feature of the Aircaps glasses. Translation is a relatively solved problem. The issue is isolating the conversation.

I’ve found meta is pretty good about not overdelivering on promised features, and as a result even though they probably have the best hardware and software stack of any glasses, the stuff you can do with the Rayban displays are extremely limited.