Its not that simple, but it would be straight forward to duplicate the outputs of this with a simple LLM + ffmpeg workflow. They did mention a custom model on the landing page, and if they've trained one then you would be spending much more money on each output than they are. Because without a fine-tuned model there would be a lot of inference done for QA and refinement of each prompt | clip | frame .
MarcelOlsz|6 months ago