(no title)
christiansafka | 1 year ago
From the technical side, speech to speech models have more potential for accuracy (no explicit ASR, no audio->text information loss). We have a few options on mimic'ing nonverbal elements - we could decide when to naturally mix in the original audio, or train our end to end model to handle those nonverbal audio chunks. We'll be trying both but likely the first option on the sooner side!
No comments yet.