top | item 33888418

(no title)

darkpicnic | 3 years ago

Does anyone know if this new model handles silence better? I was trying to use whisper for transcribing bursts of talking amid large spans of silence, but the frequency of hallucinations was too high.

discuss

order

nomel|3 years ago

I suspect a simple solution is to remove the silence, as a pre processing step in the pipeline.

lunixbochs|3 years ago

In large scale tests, I observed hallucinations from Whisper in speech regions of audio.

gibolt|3 years ago

Still important for future use to not have invalid results. This is a workaround for now

rozab|3 years ago

You don't need ML to trim out silence

sdenton4|3 years ago

Silence is often problem dependent... You may want ML to differentiate between noisy audio with speech and noisy audio without speech.

darkpicnic|3 years ago

"Silence" is a problematic term. For me, that word encompasses: squeaky chairs, typing on a loud keyboard, moving objects around on my table, etc. In a perfect world, Whisper —like a human— can easily distinguish a human voice from the din of my office, and only try and transcribe my voice.

Does anyone have solutions for clearing out "silence" from an audio file that works off something a bit more accurate than just "<= decibel x"?

Edited for grammar.