top | item 47178654

(no title)

Is there anything truly low latency(sub 100ms)? Speech recognition is so cool but I want it to be low latency.

discuss

Agree about the latency requirement.

There's https://kyutai.org/stt, which is very low latency. But it seems not as hackable.

Parakeet does streaming I think, so if you throw enough compute at it, it should be. The closest competitor is whisper v3 which is relatively slow, maybe Voxtral but it's still very new.

jasonni|2 days ago

The python MLX version of Parakeet indeed support streaming: https://github.com/senstella/parakeet-mlx It requires modification of the inference algorithm. In this implementation, I see the author even uses a custom metal kernerl to get maximum performance. The Parakeet model batch inference logic is simple. But for streaming, it may require some effort to get the best performance. It's not only the depencency issue.

regularfry|2 days ago

There's a minimum possible latency just given the structure of language and how humans process phonemes. Spoken language isn't quite unambiguously causal so there's a limit to how far you can go for a given accuracy. I don't know where the efficiency curve is though. It wouldn't surprise me if 100ms was pushing it.

noahkay13|2 days ago

On macbook pro - parakeet.cpp is very low latency, under 100ms (76ms) for 60s audio.