top | item 45979252

(no title)

chrischen | 3 months ago

Some of your points are already considered with current implementations. Airpods live translate uses your phone to display what you say to the target person, and the target person's speech is played to your airpods. I think the main issue is that there is a massive delay and apple's translation models are inferior to ChatGPT. The other thing is the airpods don't really add much. It works the same as if you had the translation app open and both people are talking to it.

Aircaps demos show it to be pretty fast and almost real time. Meta's live captioning works really fast and is supposed to be able to pick out who is talking in a noisy environment by having you look at the person.

I think most of your issues are just a matter of the models improving themselves and running faster. I've found translations tend to not be out of whack, but this is something that can't really be solved except by having better translation models. In the case of Airpods live translate the app will show both people's text.

discuss

makeitdouble|3 months ago

It's understating the lag. Faster will always be better, but even "real time" still requires the other person to complete their sentence before getting a translation (there is the edge case of the other language having similar grammatical structure and word order, but IMHO that's rare), and you catch up from there. That's enough lag to warrant putting the whole translation process literally on the table.

I see the real improvements in the models, for IRL translation I just think phones are very good at this and improving from there will be exponentially difficult.

IMHO it's the same for "bots" intervening (commenting/reacring on exchanges etc.) in meetings. Interfacing multiple humans in the same scene is always a delicate problem.