(no title)
divyaprakash | 1 month ago
The Tech:
GPU Heavy: It uses decord and PyTorch for scene analysis. I’m calculating action density and spectral flux locally to find hooks before hitting an LLM.
Local Audio: I’m using ChatterBox locally for TTS to avoid recurring costs and privacy leaks.
Rendering: Final assembly is offloaded to NVENC.
Looking for Collaborators: I’m currently looking for PRs specifically around: Intelligent Auto-Zoom: Using YOLO/RT-DETR to follow the action in a 9:16 crop.
Voice Engine Upgrades: Moving toward ChatterBoxTurbo or NVIDIA's latest TTS.
It's fully dockerized, and also has a makefile. Would love some feedback on the pipeline architecture!
amelius|1 month ago
This is the first sentence in your features section, so it is not strange if users don't understand if this tool is running locally or not.
divyaprakash|1 month ago
ramon156|1 month ago
Still a cool tool though! Although it seems partly AI generated.
fouc|1 month ago
rustyhancock|1 month ago
[deleted]
pelasaco|1 month ago
ithkuil|1 month ago