top | item 40709710

Vi-Su – Video summarization with key screenshots

3 points| Yannael | 1 year ago |vi-su.app

3 comments

Yannael|1 year ago

Hey HN!

I have been working on this app for the last weeks, as adding screenshots to video summaries help me remembering better the content (and also makes it more engaging to read the summary). And I could not find any video summarisation app that does that.

It basically creates visual video summaries (hence vi-su) with text and screenshots of the videos. GPT4 is used in the backend to generate the summary and select the screenshots to include.

I find it particularly useful for educational videos like courses, tutorials, conference talks, and documentaries where visuals matter.

The app is still in development, so there may be some quirks, but I’d now love to hear some feedback and suggestions for improvement. Thanks to try it out and share your thoughts!

pkkm|1 year ago

What a coincidence, I was actually just searching for a tool that would convert conference talks into blog posts. Since I learn faster through reading than listening, I thought that such a tool would let me absorb the information from each talk in a half or a quarter of the time.

I found this, videoticle.com (gives just one screenshot + raw transcript), videogist.co (keeps erroring out), and some commercial tools (couldn't be tried without paying first).

Here's some feedback, since you asked. I tested your tool on one of my favorite technical talks, Linux Memory Management at Scale <https://www.youtube.com/watch?v=QZZWAsBI_zY>, resulting in <https://vi-su.app/QZZWAsBI_zY/summary.html>. It was very easy to use, worked on the first try, and took shorter than I expected to generate the summary.

Unfortunately, I don't find the result useful. The summary is mostly vague generalities like "Clarifies common misconceptions about reclaimable memory", "Explains the role of swap in memory management", and "Discusses the limitations of the OOM killer". It's actually less detailed than the text on the slides, so it doesn't provide extra value over just running a scene detection tool on the video to get one screenshot per slide. All the technical details, which are the reason I like the talk so much, are missing: the difference between kswapd and direct reclaim, why the kernel OOM killer won't save you (it runs as a last resort, which can be a double digit number of minutes after your system has become unresponsive), why Facebook shifted from using memory limits to memory minimums (minimums compose better), why they prefer btrfs to ext4 (the ext4 journal causes priority inversions when used with cgroup priorities).

I was looking for something that would preserve all the detail in the talk and could reliably work as a replacement for watching it. My ideal UI would be two columns: on the left, there would be screenshots, and on the right, there would be a full transcript, just cleaned up so that it reads well: ums, ahs, hellos, and thank yous removed, sentences rephrased to be more concise, everything divided into sections and paragraphs.

Regardless, thanks for posting. I'm glad that people are working on tooling like this.