top | item 40104641

Show HN: I made a website that converts YT videos into step-by-step guides

271 points| aka_sh | 1 year ago |stepify.tech

Hey HN,

I've been working on this side project for the past month. It generates a step-by-step tutorial guide for YouTube videos that you can follow along without watching long videos. Best suited for tutorial videos but can work for other videos aswell. No BS. Just straight to the point.

The guides are generated from pure transcript so you don't have to worry about it being AI. It's my first project as a total beginner. Something I had to do inorder to get out of tutorial hell.

Please let me know if you have any suggestions or if you face any problems or bugs. I would try to fix them to the best of my abilities and as soon as possible.

I would appreciate your feedback on this. Let me know what you think!

128 comments

order

metadat|1 year ago

This is a brilliant and useful application of LLM technology, I'm impressed.

One question- On the backend, is it downloading each video CC (closed-caption) transcript and feeding that into a tuned prompt? What happens for videos where this is missing? Asking because I've noticed CC is occasionally unavailable for some YouTube videos.

If you cared to have a fallback, a potentially interesting experiment / solution for such cases is to download the video, extract the audio to a WAV file, then through the audio through Whisper [1] to generate the transcript. Using CPUa, it will still be incredibly intensive and slow, generally not much faster than real-time (e.g. a 5 minute clip will take on the order of ~5 minutes to complete transcription). However, with Whisper running on a fancy GPU it is insanely faster, between 100-200x faster, meaning even for long videos, generating the transcripts will complete in only a few seconds.

Great job @aka_sh!

[1] https://github.com/openai/whisper

p.s. Is there any chance you'd open source your code? Or do you plan to turn this into a business? The code itself is exactly a huge moat, and it'd be cool to see how you did this. Cheers.

p.p.s. stepify.tech app is currently crashing out to a heroku error page when I try to submit a YT link.

aka_sh|1 year ago

Thank you! I'm getting the transcript through an API and feeding it to the GPT. For now, the fallback function for no captions is just to make something out of the description of the video. I really appreciate the suggestion, i'll experiment around using Whisper. Regarding open source or business. I don't really know about that yet. Maybe, i'll lean towards the business side to cover the costs and see where this goes. And sorry for the downtime! API credits ran out. It should be fixed by now

j45|1 year ago

Comparing yt transcript to open whisper transcripts could be interesting if it could pick up on something extra.

There is limited need to reinvent the wheel to process audio when other things can be solved.

cchance|1 year ago

I mean if CC is missing you just run it through whisper/whisperfast and you've got CC.

jghn|1 year ago

As someone who can’t stand the modern trend away from text and towards video, I can’t praise this idea enough. The number of circumstances where a video is better than text with some clarifying pictures is quite small

SoftTalker|1 year ago

100% agree. Video can be helpful for supplementary illustration, to show exactly how to orient parts in an assembly, etc. but at the cost of (often) sitting through a lot of rambling monologue that is not.

I haven't tried this yet but it would be helpful if each step included a link to the spot in the video where that step is shown, so that in case you need it it's easy to find.

mavamaarten|1 year ago

Yeah. The only way to find some written instructions these days is searching for reddit specifically. Which I'm not a big fan of, either.

I've had multiple instances where I had a simple issue with zero decent Google results, and a YouTube result with literally the exact question I had in the title. I had to sift through 12 minutes of "like and subscribe", a dude clicking around in various screens mumbling some stuff... I would have been very happy with a simple blog post

aka_sh|1 year ago

Totally agree with you on that. I hope this lives up to your expectations. Thank you!

mbesto|1 year ago

Super interesting. I recently went down the DIY rabbit hole for solar, electricity, etc. I tested out https://stepify.tech/video/O8eVxRVwlnw and looks decent:

1. It took about ~45 seconds for the page to load once I put the URL in. You should have a loader on a page showing that the website is "doing something" while the AI transcribes.

2. It would be great to sync the chapters in the YT video with the guide details.

3. Even more advanced would be the specific items like "Drill holes, insert expansion bolts, and secure the inverter to the wall using nuts and washers." showed a timestamp and thumbnail with a link to the video part.

4. It would be great to have a checklist functionality (maybe this is the "pro version"). I often do something, get halfway and then need to scrub the YT video to find the specific place where he talks about the action item.

EDIT:

5. IMO iFixit has the best "guide" formatting: https://www.ifixit.com/Guide/How+to+Recover+Data+From+a+MacB... if you could somehow generate this by the video, that would be insanely useful.

aka_sh|1 year ago

Great suggestions! I really appreciate your feedback. I'll work on implementing these as soon as possible.

sonnyw603|1 year ago

Checkout this app called Razzl. Pretty much does what you’ve described.

makuchaku|1 year ago

Great work. A few ideas

1) Speed : the site is often showing heroku errors. Seems like you are running the entire processing in the request-response cycle. If not already done, please try to use a queueing system to perform async processing - and then let the user know when their video is ready to view as steps (probably via email or browser notifications). This will stop your site from crashing frequently and you'll be able to scale to many users very quickly.

2) Please add link-backs to the specific time in the video from where the step is shown.

Cheers!

makuchaku|1 year ago

Also, +1 to chapters as someone mentioned in the comments.

j45|1 year ago

Not sure if putting the site behind cloudflare or something could help.

Heroku just wants a bigger bill.

aka_sh|1 year ago

Noted! I'll will look into that. Thank you.

nickjj|1 year ago

Hi,

Is there a way to request items that were submit get removed? Can you provide a way to contact you such as an email address? There wasn't one posted on your site.

It's just a suggestion, I mean right now anyone can submit anyone's videos without their consent or ownership verification. How do you plan to handle that? I'm sure there will be folks out there who wouldn't feel comfortable that a site will be scraping their video content attempting to generate a large network of pages on 1 domain with loads of SEO terms. It provides a conflict of interest with the original creators. This conflict of interest is around SEO competition, reducing views from original creators and then there's the other can of worms of any future plans to monetize your site through subscriptions, paid features or ads where you'd be profiting from the content of others without their consent.

I posted one of my videos just to see what would happen and then it created a permanently hosted page on your domain with an AI generated recap of the video. I didn't realize that was going to happen. There was no warning, label of how it works, TOS that I agreed to or options available to make it private and there's no option to delete it. I put in the URL, hit submit and that was it.

It's nothing personal and I hope you don't see this as a deterrent. I'm all for building cool things and generally openly share almost everything for free (I've been blogging and making videos for ~9 years and don't have a single ad on anything I ever posted) but the idea of having inaccurate AI generated content does rub me the wrong way.

> The guides are generated from pure transcript so you don't have to worry about it being AI.

You mentioned it's generated from pure transcripts but most of the phrases used aren't what was mentioned in the video. It looks like a paraphrased version of it but it's also missing all of the details that would allow someone to follow along.

Directly under the video on the page it says "This response is AI generated". One one hand you say it's not AI generated but then on the other hand it is.

meiraleal|1 year ago

Well, this place is called hackernews, after all. Information should be free so if Youtube makes it public, public it should be.

Terretta|1 year ago

For the "Paid" or "Pro" version, let me have a browser extension that replaces ALL OF YOUTUBE with your text based breakdowns.

// I'm not really kidding! Because boy do I hate 15 minute videos with the one CLI command you need buried like a needle in a haystack. Seeing the nonsense distilled into a handful of straightforward steps is so refreshing. Awesome work!

j45|1 year ago

So true, you're after a few seconds buried in the video.

Giving the 15 seconds up front and then explaining it in more and more detail can also be appreciated by users.

layer8|1 year ago

You’d have to be lucky to get the correct and complete CLI command from the transcript though, unless this is also doing OCR, which I don’t think it is.

aka_sh|1 year ago

Thank you! I'll try implementing something like that and get back to you.

toddmorey|1 year ago

Love how the AI turned “drop a comment below” into a project step:

“Seek feedback from stakeholders or viewers by encouraging questions and comments for further engagement.”

This is from a bathroom remodel video.

aka_sh|1 year ago

Sorry for that, I'm looking into it. The problem is for videos that have no transcript. Maybe it's because i'm feeding it the description of the video for now. I'll find some workaround for this. Thanks!

plufz|1 year ago

I made something a little similar, but just as a little cli script that I run locally for myself. You can input a url for a YouTube video, podcast link or local audio/video file. It transcribes it with whisper and outputs the full transcript in one text file and I use another model to summarize it into a bullet list in a separate file.

I so appreciate these open source/access models allowing us to build these kinds of tools without having to pay and send our data to openai.

whereismyacc|1 year ago

Doesn't youtube automatically transcribe every video with whisper?

ejang0|1 year ago

This seems like something people on HN have asked for before. I clicked on one Recent video about how to create a simple Flask app in 5 minutes and the instructions seemed good on a cursory view.

I tried entering a new video but I got a Heroku application error. Maybe it's a limits thing.

When I look at the Recent videos, a lot of them are not for instructions/tutorials. Perhaps people do not understand the purpose of this project. Maybe they are just testing it out with non-tutorial content.

Maybe you could add representative videos towards the top so that people would get a better sense of the use of this project?

I don't know why this isn't more popular here. It's a good idea. (Maybe it has already been implemented elsewhere?) Reading is much faster than watching a video for many instruction-based tasks. Good luck!

aka_sh|1 year ago

Yeah, you just said what was on my mind since I launched it. The code I wrote is for tutorial videos. Non-tutorial video responses are just gibberish. The representative videos on the top is a great idea. I'll look into it.

Can you tell me more about the video you entered? Did it have a transcript? How many hours long was it?

shortformblog|1 year ago

If you continue to this road, you should plan to fund the creators that this is siphoning from, or allow them some form of consent to agree to this.

What you are doing is, whether you’ve considered this or not, at risk of harming people who are building around video because it is financially viable. People produce these guides as videos because that’s how they can make money from them, whereas it is much more difficult to do so on a website.

You need to consider the implications of what you’ve built.

_akhe|1 year ago

Hm, is this the right take? The YouTube player is embedded on the page, giving the creator YouTube views and more exposure. And I think when a person uploads to YouTube the idea is their video will be out there - including in embeds on 3rd party sites.

I just wouldn't use the word "siphoning" here. There are countless blog posts, news articles, how-to guides, etc. that will embed a video like this yet also provide supporting text for readers. I think it's a pretty normal way of sharing content.

I for one am not a person who learns by watching videos, step-by-step guides work better for me. The idea that all those video tutorials could be made available as text-based guides sounds actually very useful - and I would still be very aware of who originated that content as their video is embedded right there.

It would actually be great if when I search for a tutorial and the most relevant result is a video, if my browser could summarize that video the way search engines summarize results at the top or in the side bar.

rfl890|1 year ago

Seems like we're adding colour to bits again.

userbinator|1 year ago

The guides are generated from pure transcript so you don't have to worry about it being AI.

That just means you have to worry about voice recognition errors instead.

notahacker|1 year ago

True, but voice recognition errors typically involve an oddly-out of place word or two which you can usually spot and mentally correct. That's less likely to make you take the wrong series of steps than a completely coherent and topic-relevant "hallucinated" sentence that just happens to not be part of the guide at all.

Edit: although in this instance the LLM pretty heavily editorialises the transcript anyway...

javrin|1 year ago

You know what would save me more time? ...if I could search a database of stepified videos.

noashavit|1 year ago

This looks amazing! As a marketer, many times I struggle with repurposing long video interviews into shorter tactical videos, and this what you built looks promising. I'm excited to check it you!

whereismyacc|1 year ago

You should probably rework the recent video thing? Or not. I mean it's engagement, I guess, but I'm pretty sure people are intentionally putting silly videos on the page.

toddmorey|1 year ago

This is a great & useful resource! So many guides on YouTube are unfortunately padded with so much silliness and fluff. Would be great to link out to time codes if possible.

aka_sh|1 year ago

Thank you! Great suggestion. I'll try adding timecodes ASAP.

pedalpete|1 year ago

I could have used this on the weekend. I was working on my car, and though I had watched a few videos about removing the door, and electrical connections, etc etc. I missed on some of the details, or had to make a mental note of "this, then this, not the other way around".

What I think might be a great addition is if you had a screenshot for each point? Though I'm not sure how you'd figure out which image would best capture the action.

geekraver|1 year ago

This is cool. I have been doing this a bit more manually, by using a Chrome plugin that does YT summaries and shows transcriptions using Claude. I don’t like those summaries so I paste the transcriptions into ChatGPT (GPT4) with a prompt “Provide detailed study notes of the following video transcript”. That gives me a very similar format to yours. Will have to do some side-by-side comparisons.

gmerc|1 year ago

It’s tricky when you don’t do editorial on your homepage tho:

https://stepify.tech/video/623AC6a6org

is the first featured video…

In any case, it’s doomed- google will cut off the access or integrate the feature on their side. They thank you for the proof of concept though.

getwiththeprog|1 year ago

It is less tricky than watching a video on the subject. Very funny, but might not be a video you would want to watch at work (or home).

patal|1 year ago

It's funny though

fortran77|1 year ago

Interesting idea, but not quite useful. I tried two: How to replace a fiberglass window screen and how to replace the "cycle clutch gearh on an IBM Selectric Typewriter"

https://stepify.tech/video/KafAn1h4x14

Neither were good enough to use.

typpo|1 year ago

Great idea and congrats on shipping the project!

I'm curious if you noticed certain models worked better for summarizing and converting to steps. For example, in my projects I've found that Gemini outperforms "better" models like GPT for similar use cases, which I guess makes sense given Google's interest in summarization.

fransjorden|1 year ago

It looks like someone is flooding the service with questionable content (maybe to get you deranked from Google?)

cushychicken|1 year ago

Interesting; this is similar to an idea suggested by a Scott Galloway/Section weekly email.

1) record an SOP using Loom while you narrate, 2) grab a transcript of your narration, 3) feed transcript into ChatGPT to write list of instructions.

Was billed as a way to easily hand off processes to contractors or subordinates.

This seems like a cool riff on that. Neat.

thih9|1 year ago

> Internal Server Error

> The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Hugs all around - I'd take it as a positive feedback. Congrats on the launch!

cvhashim04|1 year ago

Wow you might have done something, saved

How are you managing costs and offering this for free?

aka_sh|1 year ago

I am not. I'm from a 3rd world country and trust me when I say I this i've burned through half of my paycheck in a few hours which is like barely 3 digits.

iamflimflam1|1 year ago

I think, to be fairer to the people actually creating the content, you should make a much more obvious link back the original video.

aka_sh|1 year ago

I will. Could you suggest a place where it would be more obvious?

BooleanMaestro|1 year ago

This is an ingenious and practical use of LLM technology, I'm thoroughly impressed.

_akhe|1 year ago

Very awesome! Would be even neater if it pulled screenshots from the video for each step :)

DevNinjaS|1 year ago

It would be great if it was open source, as I might want to make some custom modifications.

culopatin|1 year ago

Ha I had this idea a few months ago and didn’t pursue it. Love it

christensen143|1 year ago

Looks like it might be down. Love this idea.

brycelarkin|1 year ago

Love the Filthy Frank survival guide!

burrish|1 year ago

that's a lot of sexual content in the front page... might want to moderate all that

inatreecrown2|1 year ago

this is super helpful, thanks for making it! bookmarked.

Simon_ORourke|1 year ago

I've been looking for something like this for absolutely ages. If I want to figure out how to fix my cellphone, reset a warning sensor on my auto dashboard or more recently install a NAS box, there's always this long winded YouTube video packed full of ads. Thanks for helping cut through this nonsense.

aka_sh|1 year ago

Appreciate the kind words. This really means alot