top | item 43427587

(no title)

a-r-t | 11 months ago

Hi Jeff, are there any plans to support dual-channel audio recordings (e.g., Twilio phone call audio) for speech-to-text models? Currently, we have to either process each channel separately and lose conversational context, or merge channels and lose speaker identification.

discuss

order

jeffharris|11 months ago

this has been coming up often recently. nothing to announce yet, but when enough developers ask for it, we'll build it into the model's training

diarization is also a feature we plan to add

a-r-t|11 months ago

Glad to hear it's on your radar. I'd imagine phone call transcription is a significant use case.

ekzy|11 months ago

I’m not entirely sure what you mean but twilio recordings supports dual channels already

a-r-t|11 months ago

Transcribing Twilio's dual-channel recordings using OpenAI's speech-to-text while preserving channel identification.