top | item 42937861

(no title)

Fantastic question. Our opinion on this is that the higher-bandwidth we can make the communication, the more useful it will be. The reason we've moved from IRC->VoIP->Video is because of the efficiency of information transfer and additionally the empathic element of face-to-face conversation.

From the technical side, speech to speech models have more potential for accuracy (no explicit ASR, no audio->text information loss). We have a few options on mimic'ing nonverbal elements - we could decide when to naturally mix in the original audio, or train our end to end model to handle those nonverbal audio chunks. We'll be trying both but likely the first option on the sooner side!

discuss

No comments yet.