(no title)
chrischen | 3 months ago
Aircaps demos show it to be pretty fast and almost real time. Meta's live captioning works really fast and is supposed to be able to pick out who is talking in a noisy environment by having you look at the person.
I think most of your issues are just a matter of the models improving themselves and running faster. I've found translations tend to not be out of whack, but this is something that can't really be solved except by having better translation models. In the case of Airpods live translate the app will show both people's text.
makeitdouble|3 months ago
I see the real improvements in the models, for IRL translation I just think phones are very good at this and improving from there will be exponentially difficult.
IMHO it's the same for "bots" intervening (commenting/reacring on exchanges etc.) in meetings. Interfacing multiple humans in the same scene is always a delicate problem.