top | item 44887121 (no title) martzoukos | 6 months ago I guess that there is no streaming option for sending generated tokens to, say, an LLM service to process the text in real-time. discuss order hn newest nomad_horse|6 months ago Whisper has the encoder-decoder architecture, so it's hard to run streaming efficiently, though whisper-streaming is a thing.https://kyutai.org/next/stt is natively streaming STT. woodson|6 months ago There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.
nomad_horse|6 months ago Whisper has the encoder-decoder architecture, so it's hard to run streaming efficiently, though whisper-streaming is a thing.https://kyutai.org/next/stt is natively streaming STT. woodson|6 months ago There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.
woodson|6 months ago There are many streaming ASR models based on CTC or RNNT. Look for example at sherpa (https://github.com/k2-fsa/sherpa-onnx), which can run streaming ASR, VAD, diarization, and many more.
nomad_horse|6 months ago
https://kyutai.org/next/stt is natively streaming STT.
woodson|6 months ago