top | item 44645486

(no title)

The fork that I've been using, WhisperX, seems to do better. I've used it on clean splits of mic tracks (ie total silence when the other is talking) with far fewer hallucinations.

discuss

ethan_smith|7 months ago

WhisperX works better because it implements a robust VAD (Voice Activity Detection) preprocessing step that effectively filters out silence segments before they reach the model, preventing the hallucination triggers entirely.