Show HN: I made a website that converts YT videos into step-by-step guides
271 points| aka_sh | 1 year ago |stepify.tech
I've been working on this side project for the past month. It generates a step-by-step tutorial guide for YouTube videos that you can follow along without watching long videos. Best suited for tutorial videos but can work for other videos aswell. No BS. Just straight to the point.
The guides are generated from pure transcript so you don't have to worry about it being AI. It's my first project as a total beginner. Something I had to do inorder to get out of tutorial hell.
Please let me know if you have any suggestions or if you face any problems or bugs. I would try to fix them to the best of my abilities and as soon as possible.
I would appreciate your feedback on this. Let me know what you think!
metadat|1 year ago
One question- On the backend, is it downloading each video CC (closed-caption) transcript and feeding that into a tuned prompt? What happens for videos where this is missing? Asking because I've noticed CC is occasionally unavailable for some YouTube videos.
If you cared to have a fallback, a potentially interesting experiment / solution for such cases is to download the video, extract the audio to a WAV file, then through the audio through Whisper [1] to generate the transcript. Using CPUa, it will still be incredibly intensive and slow, generally not much faster than real-time (e.g. a 5 minute clip will take on the order of ~5 minutes to complete transcription). However, with Whisper running on a fancy GPU it is insanely faster, between 100-200x faster, meaning even for long videos, generating the transcripts will complete in only a few seconds.
Great job @aka_sh!
[1] https://github.com/openai/whisper
p.s. Is there any chance you'd open source your code? Or do you plan to turn this into a business? The code itself is exactly a huge moat, and it'd be cool to see how you did this. Cheers.
p.p.s. stepify.tech app is currently crashing out to a heroku error page when I try to submit a YT link.
aka_sh|1 year ago
Yannael|1 year ago
redbell|1 year ago
j45|1 year ago
There is limited need to reinvent the wheel to process audio when other things can be solved.
cchance|1 year ago
jghn|1 year ago
SoftTalker|1 year ago
I haven't tried this yet but it would be helpful if each step included a link to the spot in the video where that step is shown, so that in case you need it it's easy to find.
mavamaarten|1 year ago
I've had multiple instances where I had a simple issue with zero decent Google results, and a YouTube result with literally the exact question I had in the title. I had to sift through 12 minutes of "like and subscribe", a dude clicking around in various screens mumbling some stuff... I would have been very happy with a simple blog post
aka_sh|1 year ago
mbesto|1 year ago
1. It took about ~45 seconds for the page to load once I put the URL in. You should have a loader on a page showing that the website is "doing something" while the AI transcribes.
2. It would be great to sync the chapters in the YT video with the guide details.
3. Even more advanced would be the specific items like "Drill holes, insert expansion bolts, and secure the inverter to the wall using nuts and washers." showed a timestamp and thumbnail with a link to the video part.
4. It would be great to have a checklist functionality (maybe this is the "pro version"). I often do something, get halfway and then need to scrub the YT video to find the specific place where he talks about the action item.
EDIT:
5. IMO iFixit has the best "guide" formatting: https://www.ifixit.com/Guide/How+to+Recover+Data+From+a+MacB... if you could somehow generate this by the video, that would be insanely useful.
aka_sh|1 year ago
sonnyw603|1 year ago
makuchaku|1 year ago
1) Speed : the site is often showing heroku errors. Seems like you are running the entire processing in the request-response cycle. If not already done, please try to use a queueing system to perform async processing - and then let the user know when their video is ready to view as steps (probably via email or browser notifications). This will stop your site from crashing frequently and you'll be able to scale to many users very quickly.
2) Please add link-backs to the specific time in the video from where the step is shown.
Cheers!
makuchaku|1 year ago
j45|1 year ago
Heroku just wants a bigger bill.
aka_sh|1 year ago
nickjj|1 year ago
Is there a way to request items that were submit get removed? Can you provide a way to contact you such as an email address? There wasn't one posted on your site.
It's just a suggestion, I mean right now anyone can submit anyone's videos without their consent or ownership verification. How do you plan to handle that? I'm sure there will be folks out there who wouldn't feel comfortable that a site will be scraping their video content attempting to generate a large network of pages on 1 domain with loads of SEO terms. It provides a conflict of interest with the original creators. This conflict of interest is around SEO competition, reducing views from original creators and then there's the other can of worms of any future plans to monetize your site through subscriptions, paid features or ads where you'd be profiting from the content of others without their consent.
I posted one of my videos just to see what would happen and then it created a permanently hosted page on your domain with an AI generated recap of the video. I didn't realize that was going to happen. There was no warning, label of how it works, TOS that I agreed to or options available to make it private and there's no option to delete it. I put in the URL, hit submit and that was it.
It's nothing personal and I hope you don't see this as a deterrent. I'm all for building cool things and generally openly share almost everything for free (I've been blogging and making videos for ~9 years and don't have a single ad on anything I ever posted) but the idea of having inaccurate AI generated content does rub me the wrong way.
> The guides are generated from pure transcript so you don't have to worry about it being AI.
You mentioned it's generated from pure transcripts but most of the phrases used aren't what was mentioned in the video. It looks like a paraphrased version of it but it's also missing all of the details that would allow someone to follow along.
Directly under the video on the page it says "This response is AI generated". One one hand you say it's not AI generated but then on the other hand it is.
meiraleal|1 year ago
anonymouse008|1 year ago
I hope that didn’t wreck your compute costs
aka_sh|1 year ago
mrbluecoat|1 year ago
Terretta|1 year ago
// I'm not really kidding! Because boy do I hate 15 minute videos with the one CLI command you need buried like a needle in a haystack. Seeing the nonsense distilled into a handful of straightforward steps is so refreshing. Awesome work!
j45|1 year ago
Giving the 15 seconds up front and then explaining it in more and more detail can also be appreciated by users.
layer8|1 year ago
aka_sh|1 year ago
toddmorey|1 year ago
“Seek feedback from stakeholders or viewers by encouraging questions and comments for further engagement.”
This is from a bathroom remodel video.
aka_sh|1 year ago
plufz|1 year ago
I so appreciate these open source/access models allowing us to build these kinds of tools without having to pay and send our data to openai.
whereismyacc|1 year ago
ejang0|1 year ago
I tried entering a new video but I got a Heroku application error. Maybe it's a limits thing.
When I look at the Recent videos, a lot of them are not for instructions/tutorials. Perhaps people do not understand the purpose of this project. Maybe they are just testing it out with non-tutorial content.
Maybe you could add representative videos towards the top so that people would get a better sense of the use of this project?
I don't know why this isn't more popular here. It's a good idea. (Maybe it has already been implemented elsewhere?) Reading is much faster than watching a video for many instruction-based tasks. Good luck!
aka_sh|1 year ago
Can you tell me more about the video you entered? Did it have a transcript? How many hours long was it?
shortformblog|1 year ago
What you are doing is, whether you’ve considered this or not, at risk of harming people who are building around video because it is financially viable. People produce these guides as videos because that’s how they can make money from them, whereas it is much more difficult to do so on a website.
You need to consider the implications of what you’ve built.
_akhe|1 year ago
I just wouldn't use the word "siphoning" here. There are countless blog posts, news articles, how-to guides, etc. that will embed a video like this yet also provide supporting text for readers. I think it's a pretty normal way of sharing content.
I for one am not a person who learns by watching videos, step-by-step guides work better for me. The idea that all those video tutorials could be made available as text-based guides sounds actually very useful - and I would still be very aware of who originated that content as their video is embedded right there.
It would actually be great if when I search for a tutorial and the most relevant result is a video, if my browser could summarize that video the way search engines summarize results at the top or in the side bar.
rfl890|1 year ago
toddmorey|1 year ago
aka_sh|1 year ago
userbinator|1 year ago
That just means you have to worry about voice recognition errors instead.
notahacker|1 year ago
Edit: although in this instance the LLM pretty heavily editorialises the transcript anyway...
javrin|1 year ago
noashavit|1 year ago
whereismyacc|1 year ago
iamflimflam1|1 year ago
https://stepify.tech/video/1-Rm0mgg2RI
Here's the video for reference:
https://www.youtube.com/watch?v=1-Rm0mgg2RI
aka_sh|1 year ago
toddmorey|1 year ago
aka_sh|1 year ago
pedalpete|1 year ago
What I think might be a great addition is if you had a screenshot for each point? Though I'm not sure how you'd figure out which image would best capture the action.
geekraver|1 year ago
unknown|1 year ago
[deleted]
gmerc|1 year ago
https://stepify.tech/video/623AC6a6org
is the first featured video…
In any case, it’s doomed- google will cut off the access or integrate the feature on their side. They thank you for the proof of concept though.
getwiththeprog|1 year ago
patal|1 year ago
fortran77|1 year ago
https://stepify.tech/video/KafAn1h4x14
Neither were good enough to use.
typpo|1 year ago
I'm curious if you noticed certain models worked better for summarizing and converting to steps. For example, in my projects I've found that Gemini outperforms "better" models like GPT for similar use cases, which I guess makes sense given Google's interest in summarization.
fransjorden|1 year ago
cushychicken|1 year ago
1) record an SOP using Loom while you narrate, 2) grab a transcript of your narration, 3) feed transcript into ChatGPT to write list of instructions.
Was billed as a way to easily hand off processes to contractors or subordinates.
This seems like a cool riff on that. Neat.
ghoulishly|1 year ago
notahacker|1 year ago
[deleted]
thih9|1 year ago
> The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Hugs all around - I'd take it as a positive feedback. Congrats on the launch!
acordier16|1 year ago
see: https://news.ycombinator.com/item?id=40112792
hexidectom|1 year ago
cvhashim04|1 year ago
How are you managing costs and offering this for free?
aka_sh|1 year ago
iamflimflam1|1 year ago
aka_sh|1 year ago
unknown|1 year ago
[deleted]
BooleanMaestro|1 year ago
_akhe|1 year ago
DevNinjaS|1 year ago
culopatin|1 year ago
christensen143|1 year ago
robblbobbl|1 year ago
brycelarkin|1 year ago
burrish|1 year ago
inatreecrown2|1 year ago
Simon_ORourke|1 year ago
aka_sh|1 year ago
youcantcook|1 year ago
[deleted]