Whisper will hallucinate on audio segments that don't have any speech. VAD mitigates that. Expect worse results without it, especially on non-English audio.
Is the point that you only need one tool -- ffmpeg -- to both generate transcripts as well as embed those into a video as opposed to having multiple tools?
This is a 3 part series, the first one discusses the new native whisper integration. And correct, for the first post - the point is to show that you can only use ffmpeg to transcribe and embed subtitles in a video
pinter69|3 months ago
lern_too_spel|3 months ago
cranberryturkey|3 months ago
trq01758|3 months ago
mikece|3 months ago
pinter69|3 months ago
radicality|3 months ago
pinter69|3 months ago
If you mean ffmpeg build with whisper - from memory I didn't see ffmpeg-builds for mac, so you will probably need to compile yourself