Instructional video instead of step-by-step text is a personal pet peeve. I know it's a lot easier to just record a video to show something like "how to replace the battery on a cordless vacuum" or "removing a sink basin nut" but it's often such a painful experience for consumption (watch a moment, pause, scrub back and watch again, pause, continue, pause, all with potentially gloved hands often in tight working spaces).
I'm on the other end of this in a way. I think it may come from having to read and write all day every day. Sometimes just having somebody yak at me for a few minutes is useful.
I really enjoy watching instructional videos, especially for recipes. The demo of the cooking techniques is almost always hard to write or talk about, and easy to show.
In the kitchen it works this way for me:
1. Watch the video once or twice all the way through to "learn it" and decide if it's what I want to do.
2. Put together my mise en place and basic prep for the recipe. Learning to do this was a game changer.
3. Finally, put it on my phone or tablet in my kitchen and let it play while I work, it's mostly audio at this point as I've "seen" the content a few times but I'm just listening as if the video is a coach. I'll hit pause at the major steps, and scrub back if I need a refresher on a technique or step.
I've gotten through some very complex dishes this way, and never hit the equivalent rhythm using cookbooks or recipe websites. The audio part of step 3 is really critical to me as it helps me focus on the food rather than remembering all the steps and it's just fills up the background space in my kitchen or act as a coach. The only way it would be better for me is if it automatically paused after each step and I could then ask it "what next?" or "go back two steps, I missed a step" or some other audio prompt.
People have a mental cap on what text should cost. If someone creates instructional content that provides thousands of dollars in value, they can sell videos for $200+, but a book version is hard to sell over $50, even if both provide the same value. Even for free content it is easier to monetize YouTube than it is to monetize a blog.
If we want people to create more text-based material, it needs to have similar financial incentives.
I understand and agree with you but there are situations where full video is better anyways.
example from life:
I needed to teardown old laptop to replace thermal paste and I was following some image guide
it was all fine until one part stuck and I couldn't figure out what was holding it. there was no way to figure that out from description and images, I needed to find video.
I guess what I'm trying to say is that ideally you want both, or maybe hybrid? like step by step guide constructed from short looped videos showing you how to do that single step?
Another big annoyance is websites with recipes that do any of the following indcredibly bad UX patterns:
- Big white page not showing any text or images until the entire page and its assets are downloaded, which means if you accidentally click something and go back you have to wait another several seconds for everything to load again
- Pop up GDPR popup while hands are covered in flour and eggs
- Pop up "would you like to subscribe to the newsletter" while hands are covered in sticky sauce
- Pop up "buy this shit for 10% off" with a microscopic X button while something on high heat on the stove
- Not specifying image height and width in CSS so that when user is looking at a piece of text and images above it load, the scrolling position jumps
For these reasons alone I've largely stopped looking at the internet for recipes and turned to physical books, which are much better behaved.
I saw a YouTube video by a guy who specializes in building D&D characters. He spends twenty minutes going into detail on each one, and then makes the pitch for subscribing to his Patreon account with something like "members get all the details in a convenient list so that you don't have to keep going back to this video."
So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl. It's a bit of a shame that fixing this problem for me will cause one for him.
> So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl.
You spelled out exactly what the attention economy is about. Friction. The money is made on friction. Waste - of time, of cognitive effort, of emotions good and bad.
I feel sorry for this guy, but at the same time, I wish people recognized that attention economy isn't about some nebulous attention you have too much of and don't feel when it's being taken. On the contrary, attention is stolen through friction, and the sum of everyone who "fills their rice bowls" this way is why the web and so many processes and activities on-line feel like shit and remain painfully wasteful.
Maybe if your business model includes putting things in an inconvenient format that could best be replaced by a bulleted list, you should rethink your business model.
I was thinking the same thing. Extraction and basic formatting of information from human language is something that LLMs excel at. Especially if the result is being shown to a human so small mistakes can be tolerated.
It's a very cool technical feat, but not something I would personally pay for. I'll just spend the 1-2 minutes to watch the video for free. Not trying to discourage you, just giving honest feedback. Launching the early landing page is a good idea to validate further.
Same with recipes, where the author frequently feels the need to reiterate their grand-grand-grandparents life history in 10 paragraphs before getting to the ingredients and step-by-step instructions. It's sad to see what SEO has become, really...
This is pretty cool but I'd like to see a well-formatted recipe, not a transcript. I prefer the markdown format for recipes so I worked on something like this earlier this year [0]. It fetches Youtube subs (with no audio processing like the video itself like this project) and returns a markdown with ingredients and steps.
As someone who's learning was significantly accelerated by the "written tutorial" phase of the internet this would be a really great little tool. I find video tutorials to be far more cumbersome than text+ images.
You can also use software to detect “cuts” in the video, which can be used to improve the frame-extraction over just getting six evenly spaced frames from the video.
This is a task called "video summarization". See https://paperswithcode.com/task/video-summarization . I guess the whole project is something like summarizing from video + subtitles + text to pictures + text.
Do video formats support structured meta data to be embedded in them?
If I make a video of me cooking, can I embed the recipe in the video, etc. Not just visually, but i.e. at 10s, I digitally insert the data "Add 1 cup red peppers". It isn't necessary a caption of something said or shown, just extra data.
Could a video creator leave substantially more metadata in their videos? I always assumed the pop-up metadata was externally stored and timestamp synced. Is there a way to embed it?
That sounds a bit like subtitles, or Timed Text. There are simple formats (just a text and the moment it should appear) but some formats support changing the position, color, font… most of the times this would be embedded in an extra sidecar file like an .srt or a .sub
Recommend passing the speech-to-text narration through a round of GPT4 API to correct for any transcription errors (use some prompt giving context that it's speech to text)
This is great, thank you for sharing! I wonder what the reverse would look like. More and more nowadays, I find myself first looking on YouTube for tutorials and walkthroughs, even if they wind up being more verbose than their written counterparts.
Based on the example shown on the page, the output doesn't seem very good. If that's one of the better examples the software produced, I don't think this will be useful in practice.
This is one of the first results. The third, if I remember correctly.
I got this running yesterday (Sunday), and I wanted to write the blog post first to test if there was any interest in this topic. Apparently, yes. Now I only have to do the remaining 80% ;-)
An evolution of this process would make it feasible to do retrieval-augmented generation using information from video content. I've thought about trying to do this to improve the (already impressive) abilities LLM's possess as a creative writing assistant/rubber ducky; a lot of good writing advice is on YouTube in the form of video essays, tutorials, lectures, etc.
The copyright notice on the output is a poor choice, since you almost certainly do not own the copyright to any of the content. You've gone to impressive lengths to ensure that the result is true to the source material, which means that there is no claim to this being a transformative work.
Ha! Print that video? Yes, but can you FIND THE PRINTER? ---- I humbly apologize, I thought this was some joke, or errant stupidity. Its not. This person has put some very serious thought into not only getting it to work, but to make it useful. Very useful. You have earned my Upvote, and recommendation. Thank you Mr Forret. Thank you.
If the main challenge was 'not having the smartphone in the kitchen', then one possible solution could have been getting another screen dedicated to the kitchen. A tablet, a laptop, a small TV+Google Cast or such combination.
It seems to be a proper media for 'printing' a video.
Of course, choosing challenges and finding solutions is what drives fun.
To me the main problem this solves is having to rewatch the video over and over for each step. Most of the time it's like "Step 2: do thing" then quickly cuts to step 3 well before I could've finished step 2. So having it laid out like this is actually a decent format to receive recipes in.
I device I think would be great for the kitchen is a large wrist-mounted, waterproof e-ink screen, curved to wrap around the wrist, with two large scroll buttons.
The recipe could be loaded up via a linked smartphone or something, but then you have a device that you can touch with food covered hands and then wash it right alongside your hands later. Big screen so you don't have to squint or scroll frequently like you would on a smartwatch. E-ink so it works well despite bright kitchen lights and has low power consumption.
I actually wonder if in the limit of video encoding we could just get a diffusion model that can in real time render realistic video based on a script. Then downloading a movie is just downloading a few megabytes of a prompt and you get a movie playing based off it locally.
Maybe. The only problem I see is economical. Sure, sending over a sequence of prompts, instead of sequence of frames, is going to be a huge storage and bandwidth saver. However, you're going to pay for it dearly, in compute, whenever you want to watch such a live-generated video. In almost all cases, it's vastly better to use more storage than to use more compute, for the same reason that, if you need to keep something to stay above ground level, you're better off placing it on a table or bolting it on a wall, instead of attaching it to a jet engine pointing downwards, firing for TWR=1.
quartz|2 years ago
Instructional video instead of step-by-step text is a personal pet peeve. I know it's a lot easier to just record a video to show something like "how to replace the battery on a cordless vacuum" or "removing a sink basin nut" but it's often such a painful experience for consumption (watch a moment, pause, scrub back and watch again, pause, continue, pause, all with potentially gloved hands often in tight working spaces).
bane|2 years ago
I really enjoy watching instructional videos, especially for recipes. The demo of the cooking techniques is almost always hard to write or talk about, and easy to show.
In the kitchen it works this way for me:
1. Watch the video once or twice all the way through to "learn it" and decide if it's what I want to do.
2. Put together my mise en place and basic prep for the recipe. Learning to do this was a game changer.
3. Finally, put it on my phone or tablet in my kitchen and let it play while I work, it's mostly audio at this point as I've "seen" the content a few times but I'm just listening as if the video is a coach. I'll hit pause at the major steps, and scrub back if I need a refresher on a technique or step.
I've gotten through some very complex dishes this way, and never hit the equivalent rhythm using cookbooks or recipe websites. The audio part of step 3 is really critical to me as it helps me focus on the food rather than remembering all the steps and it's just fills up the background space in my kitchen or act as a coach. The only way it would be better for me is if it automatically paused after each step and I could then ask it "what next?" or "go back two steps, I missed a step" or some other audio prompt.
joncalhoun|2 years ago
If we want people to create more text-based material, it needs to have similar financial incentives.
Szpadel|2 years ago
I understand and agree with you but there are situations where full video is better anyways.
example from life: I needed to teardown old laptop to replace thermal paste and I was following some image guide it was all fine until one part stuck and I couldn't figure out what was holding it. there was no way to figure that out from description and images, I needed to find video.
I guess what I'm trying to say is that ideally you want both, or maybe hybrid? like step by step guide constructed from short looped videos showing you how to do that single step?
JohnFen|2 years ago
attentive|2 years ago
sonicanatidae|2 years ago
dheera|2 years ago
- Big white page not showing any text or images until the entire page and its assets are downloaded, which means if you accidentally click something and go back you have to wait another several seconds for everything to load again
- Pop up GDPR popup while hands are covered in flour and eggs
- Pop up "would you like to subscribe to the newsletter" while hands are covered in sticky sauce
- Pop up "buy this shit for 10% off" with a microscopic X button while something on high heat on the stove
- Not specifying image height and width in CSS so that when user is looking at a piece of text and images above it load, the scrolling position jumps
For these reasons alone I've largely stopped looking at the internet for recipes and turned to physical books, which are much better behaved.
atticora|2 years ago
So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl. It's a bit of a shame that fixing this problem for me will cause one for him.
TeMPOraL|2 years ago
You spelled out exactly what the attention economy is about. Friction. The money is made on friction. Waste - of time, of cognitive effort, of emotions good and bad.
I feel sorry for this guy, but at the same time, I wish people recognized that attention economy isn't about some nebulous attention you have too much of and don't feel when it's being taken. On the contrary, attention is stolen through friction, and the sum of everyone who "fills their rice bowls" this way is why the web and so many processes and activities on-line feel like shit and remain painfully wasteful.
slingnow|2 years ago
EricMausler|2 years ago
thegabriele|2 years ago
binarymax|2 years ago
kevincox|2 years ago
pforret|2 years ago
cloudking|2 years ago
avgcorrection|2 years ago
> We’ve all been there: we used the florb for too many glorbs and now it needs to be replaced. [...]
> This is an experience that everyone at the staff of howto.biz.uk has had! [...]
> But how do you replace a used-up florb? In this article we are going to show you how. [...]
> [scan the next five paragraphs]
NikxDa|2 years ago
gsa|2 years ago
[0] https://github.com/gaganpreet/summarise-youtube-recipes
TrevorJ|2 years ago
RBerenguel|2 years ago
The use-case is technical videos (like from conferences) I’m interested, but not enough to invest 20-60 minutes.
Haven’t used it in a few months so the yt-dlp commands may need updating.
dan-g|2 years ago
polygamous_bat|2 years ago
unknown|2 years ago
[deleted]
anewhnaccount2|2 years ago
patates|2 years ago
hermannj314|2 years ago
If I make a video of me cooking, can I embed the recipe in the video, etc. Not just visually, but i.e. at 10s, I digitally insert the data "Add 1 cup red peppers". It isn't necessary a caption of something said or shown, just extra data.
Could a video creator leave substantially more metadata in their videos? I always assumed the pop-up metadata was externally stored and timestamp synced. Is there a way to embed it?
pforret|2 years ago
thomastjeffery|2 years ago
jsharf|2 years ago
xnzakg|2 years ago
barrkel|2 years ago
jusquan|2 years ago
pforret|2 years ago
tgsovlerkhgsel|2 years ago
pforret|2 years ago
I got this running yesterday (Sunday), and I wanted to write the blog post first to test if there was any interest in this topic. Apparently, yes. Now I only have to do the remaining 80% ;-)
lucubratory|2 years ago
paledot|2 years ago
(Very cool and useful project, though.)
ForOldHack|2 years ago
zoomablemind|2 years ago
It seems to be a proper media for 'printing' a video.
Of course, choosing challenges and finding solutions is what drives fun.
chankstein38|2 years ago
goda90|2 years ago
The recipe could be loaded up via a linked smartphone or something, but then you have a device that you can touch with food covered hands and then wash it right alongside your hands later. Big screen so you don't have to squint or scroll frequently like you would on a smartwatch. E-ink so it works well despite bright kitchen lights and has low power consumption.
roomey|2 years ago
You could even do a little line drawing of the important bits.
You could keep this "cook" book in your kitchen, and maybe pass it to one of your kids (just an example) when they move out or something.
IgorPartola|2 years ago
TeMPOraL|2 years ago
parthianshotgun|2 years ago
adr1an|2 years ago
ada1981|2 years ago
I didn’t see how that print out would be super useful, it’s not the complete step by step is it?
einpoklum|2 years ago
* It does not print the video frames as a 3D object.
* Despite what the graphic at the link suggests, it doesn't 3D-print food
it extracts a recipe with images and text from a video, automatically.
incahoots|2 years ago
1970-01-01|2 years ago
mannyv|2 years ago
benob|2 years ago
a1o|2 years ago
unknown|2 years ago
[deleted]
Hugsun|2 years ago
Gys|2 years ago