top | item 46732776

Microsoft releases VibeVoice-ASR, an open speech-to-text model

3 points| putlake | 1 month ago |github.com

1 comment

order

putlake|1 month ago

VibeVoice-ASR is a unified speech-to-text model designed to handle 60-minute long-form audio in a single pass, generating structured transcriptions containing Who (Speaker), When (Timestamps), and What (Content), with support for Customized Hotwords.