top | item 44646901

Yt-transcriber – Give a YouTube URL and get a transcription

174 points| Bluestein | 7 months ago |github.com

57 comments

order
[+] paulirish|7 months ago|reply
Can also just fetch the subs already in YouTube rather than retranscribing. eg:

yt-dlp --write-auto-subs --skip-download "https://www.youtube.com/watch?v=7xTGNNLPyMI"

[+] adamgordonbell|7 months ago|reply
Recently, I was working on a similar project and I found that grabbing the transcripts quickly leads to your IP being blocked for the transcripts.

I ended up doing the same as this person, downloading the MP4s and then transcribing myself. I was assuming it was some sort of anti LLM scraper feature they put in place.

Has anyone used this --write-auto-subs flag and not been flagged after doing 20 or so videos?

[+] toomuchtodo|7 months ago|reply
It's a good call out. I leverage yt-dlp as a library for downstream tooling (archival of media to long term storage repositories), and always recommend folks rely on yt-dlp whenever possible due to the ecosystem of folks grinding to keep their extractors current. Their maintainers are both helpful and responsive.

(with that said, I do not want to diminish OP's work in any way; great job! "What I cannot build, I do not understand" - Feynman)

[+] mckirk|7 months ago|reply
I've found the YT transcripts to be severely lacking sometimes, in accuracy and features. Especially speaker identification is really useful if you want to e.g. summarize podcasts or interviews, so if this project here delivers on that then it's definitely better than the YT transcripts.
[+] 0points|7 months ago|reply
Youtube already offers AI transcriptions on their site. As another commenter points out, you grab them with yt-dlp.

And unlike how your tool will be supported in the future, thousands of users make sure yt-dlp keeps working as google keep changing the site (currently 1459 contributors).

[+] swyx|7 months ago|reply
if you used this in earnest sufficiently, you'd know yt default transcripts are not good enough because youtube often (ok say 5% of time) fails to transcribe videos particularly livestreams and shortly after release.

youtube also blocks transcript exports for some things like https://youtubetranscript.com/

retranscribing is necessary and important part of the creator toolset.

[+] passivegains|7 months ago|reply
the volunteer open source effort behind youtube-dl and its forks/descendants are so impressive in large part because of how many features they provide and thus have to maintain: https://github.com/yt-dlp/yt-dlp#usage-and-options this tool won't provide the list of available thumbnails or settings for HTTP buffer size, but I think that's a pretty reasonable tradeoff.
[+] totallynotryan|7 months ago|reply
Hey all, I built a 100% free (no-signup) youtube summarizer: "https://youtube-summarizer-lime.vercel.app/". Accurate summaries in under 8 seconds.
[+] dudeWithAMood|7 months ago|reply
How did you get around youtube blocking cloud IP ranges? Are you suing residential proxies?
[+] 93po|7 months ago|reply
bookmarked, thanks, the top google search results always require sign-up. frustrating state of the internet
[+] yunusabd|7 months ago|reply
I tried it on a M1 Pro MBP using Docker. It's quite slow (no MPS) and there are no timestamps in the resulting transcript. But the basics are there. Truncated output:

  Fetching video metadata...
  Downloading from YouTube...
  Generating transcript using medium model...

  === System Information ===
  CPU Cores: 10
  CPU Threads: 10
  Memory: 15.8GB
  PyTorch version: 2.7.1+cpu
  PyTorch CUDA available: False
  MPS available: False
  MPS built: False
  
  Falling back to CPU only
  Model stored in: /home/app/.cache/whisper
  Loading medium model into CPU...
  100%|| 1.42G/1.42G [02:05<00:00, 12.2MiB/s]
  Model loaded, transcribing...
  Model size: 1457.2MB
  Transcription completed in 468.70 seconds
  === Video Metadata ===
  Title: 厨师长教你:“酱油炒饭”的家常做法,里面满满的小技巧,包你学会炒饭的最香做法,粒粒分明!
  Channel: Chef Wang 美食作家王刚
  Upload Date: 20190918
  Duration: 5:41
  URL: https://www.youtube.com/watch?v=1Q-5eIBfBDQ
  === Transcript ===
  
  哈喽大家好我是王刚本期视频我跟大家分享...
[+] pstoll|7 months ago|reply
> Falling back to CPU only

Patient: “Doctor, it hurts when I do this.”

Doctor: “don’t do that”

[+] cmaury|7 months ago|reply
Thanks for sharing. This is exactly the type of utility that vibecoding is for. It takes 5 secons to ask GPT to write a scripr to do this tailored to your specific use case. It's way faster than trying to get someone elses repo up and running.
[+] Bluestein|7 months ago|reply
Sure thing ...

And, yes, indeed, AI-coding is order-of-magnitude having an effect along the lines that "low-code" was treading ...

... also, for less-capable coders or "borderline" coders the effort/benefit equation has radically shifted.-

[+] labrador|7 months ago|reply
Many channels I follow, such as Vlad Vexler, have taken measures so you can't download the transcript with yt-dlp. Furthermore, they don't provide a transcipt option on their videos. I assume this is to prevent people from just reading AI summaries, which is annoying in Vexler's case because he talks slowly and meanders around. If I really want to hear his point but don't want to listen to that then I download the video with yt-dlp and use Whisper to transcribe it.
[+] rs186|7 months ago|reply
Curious, if you don't find this "annoying", why are you still following the channel? There must be other YouTube channels that offer similar content but deliver it in a better way.
[+] Bluestein|7 months ago|reply
... the ... slower ... the guy the ... less ... content ... and ... more ... advertising.-
[+] mikeve|7 months ago|reply
Interesting project! I've been working on a project in this space myself (WaveMemo)

I must say, speaker diarization is surprisingly tricky to do. The most common approach seems to be to use pyannote, but the quality is not amazing...

[+] toddmorey|7 months ago|reply
Always fascinated to read CLAUDE.md files that are appearing in more and more open source projects: https://github.com/pmarreck/yt-transcriber/blob/yolo/CLAUDE....

I'd be really curious to see some sort of benchmark / evaluation of these context resources against the same coding tasks. Right now, the instructions all sound so prescriptive and authoritative, yet is really hard to evaluation their effectiveness.

[+] lpeancovschi|7 months ago|reply
Youtube's T&C don't allow downloading youtube audio/video. How do other services get away with it?
[+] MysticOracle|7 months ago|reply
I think they use rotating IP/Proxy services
[+] arkaic|7 months ago|reply
On this note, is Ytube also the best transcriber of foreign languages or is there something better?
[+] manishsharan|7 months ago|reply
Will this make Google mad at me and cancel/freeze all my Google services ?
[+] senko|7 months ago|reply
I vibecoded something similar for myself, transcribes and summarizes the content into article format: https://github.com/senko/scribe

Uses yt-dlp, whisper, and a LLM (Gemini hardcoded because it handles long contexts well, but easy to switch) for summarizer.

I dislike podcast as a format (S/N level way too low for my taste), so use this whenever I want to get a tldr of some episode.

I should check out the SOTA models and improve the summarization prompt, but aren't in a hurry as this works pretty well for my needs already.