I would say that overall they are much, much better than the auto generated ones from YouTube. If the speaker speaks incredibly clearly and slowly, without slang, etc, then the built in ones are good enough. But in a tougher situation, the biggest whisper model achieves near superhuman accuracy— way better.
While it seems YouTube's auto-generated are hit or miss, I wonder if feeding them through an LLM can fix the mistakes and still get the video's idea out of them
I've found that to be the case. I typically don't want a full transcript -- I want the materials list, or a summary, or a counterargument. I've found it is totally sufficient to just plop the transcript into an LLM and ask for my desired output. No need to clean of the transcript ahead of time.
eigenvalue|2 years ago
kawsper|2 years ago
English is my second language, and I mumble.
BetterWhisper|2 years ago
josephrmartinez|2 years ago
tsurba|2 years ago