Now I can just print that video

quartz|2 years ago

Definitely would use this.

Instructional video instead of step-by-step text is a personal pet peeve. I know it's a lot easier to just record a video to show something like "how to replace the battery on a cordless vacuum" or "removing a sink basin nut" but it's often such a painful experience for consumption (watch a moment, pause, scrub back and watch again, pause, continue, pause, all with potentially gloved hands often in tight working spaces).

bane|2 years ago

I'm on the other end of this in a way. I think it may come from having to read and write all day every day. Sometimes just having somebody yak at me for a few minutes is useful.

I really enjoy watching instructional videos, especially for recipes. The demo of the cooking techniques is almost always hard to write or talk about, and easy to show.

In the kitchen it works this way for me:

1. Watch the video once or twice all the way through to "learn it" and decide if it's what I want to do.

2. Put together my mise en place and basic prep for the recipe. Learning to do this was a game changer.

3. Finally, put it on my phone or tablet in my kitchen and let it play while I work, it's mostly audio at this point as I've "seen" the content a few times but I'm just listening as if the video is a coach. I'll hit pause at the major steps, and scrub back if I need a refresher on a technique or step.

I've gotten through some very complex dishes this way, and never hit the equivalent rhythm using cookbooks or recipe websites. The audio part of step 3 is really critical to me as it helps me focus on the food rather than remembering all the steps and it's just fills up the background space in my kitchen or act as a coach. The only way it would be better for me is if it automatically paused after each step and I could then ask it "what next?" or "go back two steps, I missed a step" or some other audio prompt.

joncalhoun|2 years ago

People have a mental cap on what text should cost. If someone creates instructional content that provides thousands of dollars in value, they can sell videos for $200+, but a book version is hard to sell over $50, even if both provide the same value. Even for free content it is easier to monetize YouTube than it is to monetize a blog.

If we want people to create more text-based material, it needs to have similar financial incentives.

Szpadel|2 years ago

oh, it depends

I understand and agree with you but there are situations where full video is better anyways.

example from life: I needed to teardown old laptop to replace thermal paste and I was following some image guide it was all fine until one part stuck and I couldn't figure out what was holding it. there was no way to figure that out from description and images, I needed to find video.

I guess what I'm trying to say is that ideally you want both, or maybe hybrid? like step by step guide constructed from short looped videos showing you how to do that single step?

JohnFen|2 years ago

I agree 100% with this. Having both video and text is ideal, but if it has to be one or the other, text is much better.

attentive|2 years ago

Bard can do this. They have youtube extension.

sonicanatidae|2 years ago

100% with you on this one. Just give me the list. If I need visuals, I'll chase them down.

dheera|2 years ago

Another big annoyance is websites with recipes that do any of the following indcredibly bad UX patterns:

- Big white page not showing any text or images until the entire page and its assets are downloaded, which means if you accidentally click something and go back you have to wait another several seconds for everything to load again

- Pop up GDPR popup while hands are covered in flour and eggs

- Pop up "would you like to subscribe to the newsletter" while hands are covered in sticky sauce

- Pop up "buy this shit for 10% off" with a microscopic X button while something on high heat on the stove

- Not specifying image height and width in CSS so that when user is looking at a piece of text and images above it load, the scrolling position jumps

For these reasons alone I've largely stopped looking at the internet for recipes and turned to physical books, which are much better behaved.

atticora|2 years ago

I saw a YouTube video by a guy who specializes in building D&D characters. He spends twenty minutes going into detail on each one, and then makes the pitch for subscribing to his Patreon account with something like "members get all the details in a convenient list so that you don't have to keep going back to this video."

So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl. It's a bit of a shame that fixing this problem for me will cause one for him.

TeMPOraL|2 years ago

> So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl.

You spelled out exactly what the attention economy is about. Friction. The money is made on friction. Waste - of time, of cognitive effort, of emotions good and bad.

I feel sorry for this guy, but at the same time, I wish people recognized that attention economy isn't about some nebulous attention you have too much of and don't feel when it's being taken. On the contrary, attention is stolen through friction, and the sum of everyone who "fills their rice bowls" this way is why the web and so many processes and activities on-line feel like shit and remain painfully wasteful.

slingnow|2 years ago

Maybe if your business model includes putting things in an inconvenient format that could best be replaced by a bulleted list, you should rethink your business model.

EricMausler|2 years ago

Have you considered paying for the patreon regardless because you consume his content one way or another and value it?

thegabriele|2 years ago

Do you remember what was the channel? Thanks

binarymax|2 years ago

No need to spend hours trying to get the text extraction just right - pass the raw extraction into GPT and ask for it to give you the recipe.

kevincox|2 years ago

I was thinking the same thing. Extraction and basic formatting of information from human language is something that LLMs excel at. Especially if the result is being shown to a human so small mistakes can be tolerated.

pforret|2 years ago

Thanks for the tip! I will add GPT to the mix to clean up the speech and title data.

cloudking|2 years ago

It's a very cool technical feat, but not something I would personally pay for. I'll just spend the 1-2 minutes to watch the video for free. Not trying to discourage you, just giving honest feedback. Launching the early landing page is a good idea to validate further.

avgcorrection|2 years ago

I could also need a service for trimming all of the fat from how-to articles.

> We’ve all been there: we used the florb for too many glorbs and now it needs to be replaced. [...]

> This is an experience that everyone at the staff of howto.biz.uk has had! [...]

> But how do you replace a used-up florb? In this article we are going to show you how. [...]

> [scan the next five paragraphs]

NikxDa|2 years ago

Same with recipes, where the author frequently feels the need to reiterate their grand-grand-grandparents life history in 10 paragraphs before getting to the ingredients and step-by-step instructions. It's sad to see what SEO has become, really...

gsa|2 years ago

This is pretty cool but I'd like to see a well-formatted recipe, not a transcript. I prefer the markdown format for recipes so I worked on something like this earlier this year [0]. It fetches Youtube subs (with no audio processing like the video itself like this project) and returns a markdown with ingredients and steps.

[0] https://github.com/gaganpreet/summarise-youtube-recipes

TrevorJ|2 years ago

As someone who's learning was significantly accelerated by the "written tutorial" phase of the internet this would be a really great little tool. I find video tutorials to be far more cumbersome than text+ images.

RBerenguel|2 years ago

I kind of wrote something for this a few years ago: https://github.com/rberenguel/glancer [edited a fat-fingered copy-paste]

The use-case is technical videos (like from conferences) I’m interested, but not enough to invest 20-60 minutes.

Haven’t used it in a few months so the yt-dlp commands may need updating.

dan-g|2 years ago

Sadly getting a 404 here–maybe this is a private repository?

polygamous_bat|2 years ago

You can also use software to detect “cuts” in the video, which can be used to improve the frame-extraction over just getting six evenly spaced frames from the video.

unknown|2 years ago

[deleted]

anewhnaccount2|2 years ago

This is a task called "video summarization". See https://paperswithcode.com/task/video-summarization . I guess the whole project is something like summarizing from video + subtitles + text to pictures + text.

patates|2 years ago

Not the post author but I tried this with ffmpeg and failed. Do you (does anyone) want to share some pointers?

hermannj314|2 years ago

Do video formats support structured meta data to be embedded in them?

If I make a video of me cooking, can I embed the recipe in the video, etc. Not just visually, but i.e. at 10s, I digitally insert the data "Add 1 cup red peppers". It isn't necessary a caption of something said or shown, just extra data.

Could a video creator leave substantially more metadata in their videos? I always assumed the pop-up metadata was externally stored and timestamp synced. Is there a way to embed it?

pforret|2 years ago

That sounds a bit like subtitles, or Timed Text. There are simple formats (just a text and the moment it should appear) but some formats support changing the position, color, font… most of the times this would be embedded in an extra sidecar file like an .srt or a .sub

thomastjeffery|2 years ago

It would be better all-around to just have that data in a separate file with timestamps.

jsharf|2 years ago

Recommend passing the speech-to-text narration through a round of GPT4 API to correct for any transcription errors (use some prompt giving context that it's speech to text)

xnzakg|2 years ago

Wonder if Kagi's universal summarizer would work on recipe videos. It seems to do a decent job on YouTube videos, but those usually have cc built in.

barrkel|2 years ago

Great, a way to turn videos into something I can scan. Actually something I'd consider using.

jusquan|2 years ago

This is great, thank you for sharing! I wonder what the reverse would look like. More and more nowadays, I find myself first looking on YouTube for tutorials and walkthroughs, even if they wind up being more verbose than their written counterparts.

pforret|2 years ago

Using yt-dlp, ffmpeg and various AI services to print videos (e.g. cooking IG reels)

tgsovlerkhgsel|2 years ago

Based on the example shown on the page, the output doesn't seem very good. If that's one of the better examples the software produced, I don't think this will be useful in practice.

pforret|2 years ago

This is one of the first results. The third, if I remember correctly.

I got this running yesterday (Sunday), and I wanted to write the blog post first to test if there was any interest in this topic. Apparently, yes. Now I only have to do the remaining 80% ;-)

lucubratory|2 years ago

An evolution of this process would make it feasible to do retrieval-augmented generation using information from video content. I've thought about trying to do this to improve the (already impressive) abilities LLM's possess as a creative writing assistant/rubber ducky; a lot of good writing advice is on YouTube in the form of video essays, tutorials, lectures, etc.

paledot|2 years ago

The copyright notice on the output is a poor choice, since you almost certainly do not own the copyright to any of the content. You've gone to impressive lengths to ensure that the result is true to the source material, which means that there is no claim to this being a transformative work.

(Very cool and useful project, though.)

ForOldHack|2 years ago

Ha! Print that video? Yes, but can you FIND THE PRINTER? ---- I humbly apologize, I thought this was some joke, or errant stupidity. Its not. This person has put some very serious thought into not only getting it to work, but to make it useful. Very useful. You have earned my Upvote, and recommendation. Thank you Mr Forret. Thank you.

zoomablemind|2 years ago

If the main challenge was 'not having the smartphone in the kitchen', then one possible solution could have been getting another screen dedicated to the kitchen. A tablet, a laptop, a small TV+Google Cast or such combination.

It seems to be a proper media for 'printing' a video.

Of course, choosing challenges and finding solutions is what drives fun.

chankstein38|2 years ago

To me the main problem this solves is having to rewatch the video over and over for each step. Most of the time it's like "Step 2: do thing" then quickly cuts to step 3 well before I could've finished step 2. So having it laid out like this is actually a decent format to receive recipes in.

goda90|2 years ago

I device I think would be great for the kitchen is a large wrist-mounted, waterproof e-ink screen, curved to wrap around the wrist, with two large scroll buttons.

The recipe could be loaded up via a linked smartphone or something, but then you have a device that you can touch with food covered hands and then wash it right alongside your hands later. Big screen so you don't have to squint or scroll frequently like you would on a smartwatch. E-ink so it works well despite bright kitchen lights and has low power consumption.

roomey|2 years ago

These tik tok videos are pretty short right? Why not just get a note book and write down the instructions.

You could even do a little line drawing of the important bits.

You could keep this "cook" book in your kitchen, and maybe pass it to one of your kids (just an example) when they move out or something.

IgorPartola|2 years ago

I actually wonder if in the limit of video encoding we could just get a diffusion model that can in real time render realistic video based on a script. Then downloading a movie is just downloading a few megabytes of a prompt and you get a movie playing based off it locally.

TeMPOraL|2 years ago

Maybe. The only problem I see is economical. Sure, sending over a sequence of prompts, instead of sequence of frames, is going to be a huge storage and bandwidth saver. However, you're going to pay for it dearly, in compute, whenever you want to watch such a live-generated video. In almost all cases, it's vastly better to use more storage than to use more compute, for the same reason that, if you need to keep something to stay above ground level, you're better off placing it on a table or bolting it on a wall, instead of attaching it to a jet engine pointing downwards, firing for TWR=1.

parthianshotgun|2 years ago

Wouldn't it be non-deterministic? (Legit question, I'm new to this)

adr1an|2 years ago

Cool! I had the same project idea recently. You may be interested in this for the step of speech2text: https://github.com/SYSTRAN/faster-whisper

ada1981|2 years ago

I think you could send all of that to GPT4 and ask it to read it and provide you with a step by step instruction : recipie and it would do so easily.

I didn’t see how that print out would be super useful, it’s not the complete step by step is it?

einpoklum|2 years ago

Ok, so:

* It does not print the video frames as a 3D object.

* Despite what the graphic at the link suggests, it doesn't 3D-print food

it extracts a recipe with images and text from a video, automatically.

incahoots|2 years ago

Oh wow....this will incredibly useful for the influx of recent home improvement videos I've been watching lately.

1970-01-01|2 years ago

Filtering a video for true content is the real app. Print is simply the format you've chosen to express it.

mannyv|2 years ago

If there are YouTube-generated captions you can get yt-dlp to download them when you download the video.

benob|2 years ago

For some reason I though the goal was to print (with a 3d printer) a 3d projection of the 4d content of the video. That would be cool...

a1o|2 years ago

I thought it would just print the dessert in a way I could eat. It would be much easier. :P

unknown|2 years ago

[deleted]

Hugsun|2 years ago

Great work! It's potentially useful and also hilarious.

Gys|2 years ago

Could have been a Show HN

119 comments