Parakeet does streaming I think, so if you throw enough compute at it, it should be. The closest competitor is whisper v3 which is relatively slow, maybe Voxtral but it's still very new.
The python MLX version of Parakeet indeed support streaming: https://github.com/senstella/parakeet-mlx
It requires modification of the inference algorithm. In this implementation, I see the author even uses a custom metal kernerl to get maximum performance.
The Parakeet model batch inference logic is simple. But for streaming, it may require some effort to get the best performance. It's not only the depencency issue.
There's a minimum possible latency just given the structure of language and how humans process phonemes. Spoken language isn't quite unambiguously causal so there's a limit to how far you can go for a given accuracy. I don't know where the efficiency curve is though. It wouldn't surprise me if 100ms was pushing it.
ahaferburg|2 days ago
There's https://kyutai.org/stt, which is very low latency. But it seems not as hackable.
moffkalast|2 days ago
jasonni|2 days ago
regularfry|2 days ago
noahkay13|2 days ago