I built this because I was tired of "AI tools" that were just wrappers around expensive APIs with high latency. As a developer who lives in the terminal (Arch/Nushell), I wanted something that felt like a CLI tool and respected my hardware.
The Tech:
GPU Heavy: It uses decord and PyTorch for scene analysis. I’m calculating action density and spectral flux locally to find hooks before hitting an LLM.
Local Audio: I’m using ChatterBox locally for TTS to avoid recurring costs and privacy leaks.
Rendering: Final assembly is offloaded to NVENC.
Looking for Collaborators: I’m currently looking for PRs specifically around:
Intelligent Auto-Zoom: Using YOLO/RT-DETR to follow the action in a 9:16 crop.
Voice Engine Upgrades: Moving toward ChatterBoxTurbo or NVIDIA's latest TTS.
It's fully dockerized, and also has a makefile. Would love some feedback on the pipeline architecture!
I don't get this reasoning. You were tired of LLM wrappers, but what is your tool? These two requirements (felt like a CLI and respects your hardware) do not line up.
Still a cool tool though! Although it seems partly AI generated.
I watched a video[1] recently that posited the idea of AI slop farms making large, auto-moderated spaces impossible to find meaningful human content in. With the idea that it'll lead to a renaissance for smaller, more personal websites like forums or other niche places to flourish.
I think that sounds a little too convenient and idealistic to be what really happens, but I did find the concept to be a potential positive to what's happening around it. Facebook is already a good portion of the way there, being stuffed with bots consuming stolen or AI content from other bots, with confused elderly people in the middle.
What's the intended use case for this? It seems like you'd create slop videos for social media. I'd love to see more AI use cases that aren't: uninteresting content people would prefer to avoid.
It’s actually designed for your own gameplay—it scans hours long raw session to find the best highlights and clips them into shorts. It's more about automating the tedious editing process for your own content rather than generating "slop" from scratch.
big fan of the 'respects my hardware' philosophy. i feel like 90% of ai tools right now are just expensive middleware for openai, so seeing something that actually leverages local compute (and doesn't leak data) is refreshing
Definitely. The architecture is modular—just swap the LLM prompts for 'cinematic' styles. It's headless and dockerized, so it fits well as a SaaS backend worker
divyaprakash|1 month ago
The Tech:
Looking for Collaborators: I’m currently looking for PRs specifically around: It's fully dockerized, and also has a makefile. Would love some feedback on the pipeline architecture!amelius|1 month ago
This is the first sentence in your features section, so it is not strange if users don't understand if this tool is running locally or not.
ramon156|1 month ago
Still a cool tool though! Although it seems partly AI generated.
pelasaco|1 month ago
HeartofCPU|1 month ago
divyaprakash|1 month ago
Jgrace|1 month ago
[deleted]
wasmainiac|1 month ago
Regardless, we need more tools like this to speed social media towards death.
divyaprakash|1 month ago
techjamie|1 month ago
I think that sounds a little too convenient and idealistic to be what really happens, but I did find the concept to be a potential positive to what's happening around it. Facebook is already a good portion of the way there, being stuffed with bots consuming stolen or AI content from other bots, with confused elderly people in the middle.
[1] https://youtu.be/_QlsGkDvVHU
myky22|1 month ago
I did smth similar 4 years ago with YOLO ultralytics.
Back then I used chat messsges spike as one of several variables to detect highs and fails moments. It needed a lot a human validation but was so fun.
Keep going
divyaprakash|1 month ago
8organicbits|1 month ago
divyaprakash|1 month ago
Huston1992|1 month ago
mpaepper|1 month ago
divyaprakash|1 month ago
Yash16|1 month ago
[deleted]
divyaprakash|1 month ago