Subtitle is now open-source

[+] ipsum2|2 years ago|reply

Whisper already generates subtitles[0], supporting VTT and SRT so this is just a thin wrapper around that.

[0]: https://github.com/openai/whisper/blob/e58f28804528831904c3b...

[+] dicytea|2 years ago|reply

Yeah, you can just do whisper --output_format vtt, so I have no idea what this wrapper even adds.

I wonder if the whole thing is just an AI-generated project. The "About Me" section is pretty illuminating (unabridged):

> I'm a Developer i will feel the code then write.

[+] innovatorved|2 years ago|reply

While this approach may seem simpler, this project method utilizes a more optimized and faster model, resulting in improved efficiency and performance.

[+] codethief|2 years ago|reply

I was surprised to see there were no ML-related dependencies (neither models nor libraries), so I had a look at the code: The models are downloaded from Huggingface, and the repo comes with a precompiled whisper.cpp binary to execute them.

[+] innovatorved|2 years ago|reply

Yes, for more info check the project references

[+] einpoklum|2 years ago|reply

A few things I don't understand...

* What languages are supported? Is there a list?

* What does 'subtitle' do, which 'whisper' doesn't?

* How do I install this system-wide on an apt-based system (in which pip install --system doesn't work)?

[+] socks|2 years ago|reply

it's just whisper and some code that downloads the models from huggingface

[+] vjulian|2 years ago|reply

I have a question: I have 200-300 hours of audio recordings of interviews. I an using Otter.ai to automate transcription, and for each recording I export a ".vtt" file of the transcript.

What I'd like to do is create a type of ebook of all these transcripts, where if I click on a word, then the corresponding audio will start playing from roughly the same point in time within the interview.

Otter can do this already (if I'm online and logged in to their website), but I don't want to be tied to their website forever. I'd like to have a local copy that can perform similarly. Amazon ebooks can do this as well, I believe, where there is a corresponding verbatim audiobook. However, this project of mine is purely personal. I won't be selling my audio recordings or transcripts.

Any advice? Could software discussed here be helpful in what I'm trying to accomplish?

[+] akx|2 years ago|reply

This software won't help you.

If you already have a .vtt, this is not a hard exercise to do e.g. entirely in a browser: parse the .vtt (they're simple text), lay out the text as you like with each segment being a clickable element (e.g. a link), and hook that up to seek an `<audio>` element to where you like.

[+] rainburg|2 years ago|reply

AFAIK Whisper still can't handle multi-language content. If the audio has two languages (different narrators, for example), Whisper transcribes both of them during the first minute or so, and then either entirely skips one of the languages, or translates the foreign language to English, for the rest of the audio.

So, the value proposition of a subtitle-generating wrapper for Whisper would be to have an option to split audio into ~1 minute segments, transcribe them separately, and to somehow accurately join them. And I don’t think this one does such a thing.

[+] nottorp|2 years ago|reply

I don't know what you're thinking about but when I watch a movie I'm happy if all subtitles are in the same language :) One that I know ideally.

[+] extua|2 years ago|reply

I could see myself using this, subtitling things is extremely time consuming and there aren't that many tools which will automate it for you. It looks pretty straightforward to use - just two steps to install (if you already have FFMpeg and Python), and then one command to run the script. Well done!

[+] innovatorved|2 years ago|reply

If you find this project helpful, please consider starring the repo

[+] callalex|2 years ago|reply

@dang this person is clearly using sock puppet accounts on HN.

[+] Vaslo|2 years ago|reply

Which person??

[+] hr2016|2 years ago|reply

Very interesting, would it be (is it?) possible to output the subtitle in a different language? For example English to Icelandic?

[+] btdmaster|2 years ago|reply

I wonder how much more a model would learn about subtitles from including audio AND video in training. Sure, the costs would be way bigger (parsing video even deterministically is 1.5 orders of magnitude worse than audio) but it might help with the edge cases where the speech is so unclear even the subtitle scene can't agree.

[+] innovatorved|2 years ago|reply

[deleted]

[+] benob|2 years ago|reply

There is also Whisperx, a modification of whisper with accurate word timing and confidence scores.

It gives pretty good subtitles.

[+] whywhywhywhy|2 years ago|reply

Could really benefit from an example of what comes out the other end of it, in this article and in the repo.

[+] elkos|2 years ago|reply

Maybe an off-topic comment.

I'm not a native English speaker and I tend to use the LiveCaption application in Linux when I attend English speaking online meetings. Would love to have the opportunity to have subtitles in my native language (Greek) too while doing so.

[+] anthk|2 years ago|reply

I do the same with tech oriented podcasts. They have a clear speech, so transcribing them right it's very easy to do. Non-native English speaker here, too.

[+] einpoklum|2 years ago|reply

Seems not to work - it fails to generate a VTT file:

https://github.com/innovatorved/subtitle/issues/6

[+] epups|2 years ago|reply

Thank you, I was looking for a similar tool and was surprised by how difficult it was to find something with no bloat. Will give it a shot

[+] lern_too_spel|2 years ago|reply

I've gotten good results with whisperx when I needed to generate captions. https://github.com/m-bain/whisperX

There is currently a problem with diarization, but otherwise, it is SOTA.

[+] innovatorved|2 years ago|reply

Thank you for checking out the project! If you find it useful, please consider starring the repository.

[+] butz|2 years ago|reply

What hardware do I need to run this locally? What languages are supported?

[+] alberth|2 years ago|reply

Siri

I hope Siri does something to improve. It’s voice-to-text for me is still horrible.

37 comments