FFmpeg has complex syntax because it’s dealing with the _complexity of video_. I agree with everyone about knowing (and helping create or contribute to) our tools.
Today I largely forget about the _legacy_ of video, the technical challenges, and how critical it was to get it right.
There are an incredible number of output formats and considerations for _current_ screens (desktop, tablet, mobile, tv, etc…). Then we have a whole other world on the creation side for capture, edit, live broadcast…
On legacy formats it used to be so complex with standards, requirements, and evolving formats. Today, we don’t even think about why we have 29.97fps around? Interlacing?
We have a mix of so many incredible (and sometimes frustrating) codecs, needs and final outputs, so it’s really amazing the power we have with a tool like FFmpeg… It’s daunting but really well thought out.
So just a big thanks to the FFmpeg team for all their incredible work over the years…
Have to admit, ffmpeg syntax is not trivial... but also the project is 24 years old and is basically the defacto industry standard. If you believe you will still be editing videos in 20 years with the CLI (or any other tool or any programming language) wrapping it then it's probably worth few hours learning how it actually works.
The syntax isn't too bad. The problem is that I have to use it a couple of times a year, on average. So every time I've forgotten and have to relearn. This doesn't happen with GUIs nearly as much, and it's why I prefer them over CLI tools for anything that I don't do at least once every week or two.
My question/curiosity is why do so many people use ffmpeg (frustrated by the syntax) when GStreamer is available?
`gst-launch-1.0 filesrc ! qt4demux ! matroskamux ! filesink...` people would be less frustrated maybe?
People would also learn a little more and be less frustrated when conversation about container/codec/colorspace etc... come up. Each have a dedicated element and you can better understand its I/O
I agree, I suggest using this instead : https://github.com/kkroening/ffmpeg-python . While not perfect once you figure it out it is far easier to use and you can wrap more complicated workflows and reuse them later.
I think that goes with almost every tool you want to use with llm. User should already know the tool ideally so mistakes by llm can be prevented before they happen.
Here making ffmpeg as "just another capability" allows it to be stitched together in workflows
true, companies like Descript, Veed, or Kapwing exist because no coders find this syntax intimidating. Plus, a CLI tool stands out of a workflow. We wanted to change that.
The thing that helped me get over that ffmpeg bump, where you go from copying stack overflow answers to actually sort of understanding what you are doing is the fairly recent include external file syntax. On the surface it is such a minor thing, but mentally it let me turn what was a confusing mess into a programing language. There are a couple ways to evoke it but the one I used was to load the whole file as an arg. Note the slash, it is important "-/filter_complex filter_file"
"A special syntax implemented in the ffmpeg CLI tool allows loading option values from files. This is done be prepending a slash ’/’ to the option name, then the supplied value is interpreted as a path from which the actual value is loaded."
For how critical that was to getting over my ffmpeg hump, I wish it was not buried halfway through the documentation, but also, I don't know where else it would go.
And just because I am very proud of my accomplishment here is the ffmpeg side of my project, motion detection using mainly ffmpeg, there is some python glue logic to watch stdout for the events but all the tricky bits are internal to ffmpeg.
The filter(comments are added for audience understanding):
[0:v]
split #split the camera feed into two parts, passthrough and motion
[vis],
scale= #scale the motion feed way down, less cpu and it works better
w=iw/4:
h=-1,
format= #needed because blend did not work as expected with yuv
gbrp,
tmix= #temporial blur to reduce artifacts
frames=2,
[1:v] #the mask frame
blend= #mask the motion feed
all_mode=darken,
tblend= #motion detect actual, the difference from the last frame
all_mode=difference,
boxblur= #blur the hell out of it to increase the number of motion pixels
lr=20,
maskfun= #mask it to black and white
low=3:
high=3,
negate, #make the motion pixels black
blackframe= #puts events on stdout when too many black pixels are found
amount=1
[motion]; #motion output
[vis]
tpad= #delay pass through so you get the start of the event when notified
start=30
[original]; #passthrough output
and the ffmpeg evocation:
ff_args = [
'ffmpeg',
'-nostats',
'-an',
'-i',
camera_loc, #a security camera
'-i',
'zone_all.png', # mask as to which parts are relavent for motion detection
'-/filter_complex',
'motion_display.filter', #the filter doing all the work
'-map', #sort out the outputs from the filter
'[original]',
'-f',
'mpegts', #I feel a little weied using mpegts but it was the best "streaming" of all the formats I tried
'udp://127.0.0.1:8888', #collect the full video from here
'-map',
'[motion]',
'-f',
'mpegts',
'udp:127.0.0.1:8889', #collect the motion output from here, mainly for debugging
]
As someone who has used ffmpeg for 10+ years maintaining a relatively complex backend service that's basically a JSON to ffmpeg translator I did not fully understand this article.
Like the Before vs after section doesn't even seem to create the same thing, the before has no speedup, the after does.
In the end it seems they basically created a few services ("recipes") that they can reuse to do simple stuff like speed-up 2x or combine audio / video or whatever
thanks for calling it out, I will correct the before vs after section. But you can describe any ffmpeg capability in plain English and the underlying ffmpeg tool call takes care of it.
This doesn't make any sense; the Before and After examples accomplish different things. I also don't get who the target audience is; people intimidated by a CLI tool but at home with technical agents?
I just tell Claude Code what I want to do and that it has imagemagick and ffmpeg available and it does all the work for me. Because it's got an agentic flow, it loops around, checks the output and fixes things up.
I can ask it to orient people the right way, crop to the important parts, etc. and it will figure out what "the right way", "the important parts", etc. are. Sometimes I have to give it some light hints like "extract n frames from before y to figure out things", but most of the time it just does it.
Claude Code acts like a very general purpose agent for me. About the one thing that I have to manually do that I'm annoyed by is editing 360 videos into a flow. I'd like to be able to tell Claude Code to "follow my daughter as I dunk her in the pool" and stuff like that but I have to do that myself in the GoPro editor.
I considered FFmpeg a great project, but I usually avoid to use it directly because of his quite complex syntax. I'm reconsidering it because coupled with an llm is very straightforward and more immediate than an usual graphical editor
100x.bot primarily a browser automation engine (think imacros ) but with llm and all the tools for interacting with the Dom and a better interface . there is a workflow builder so you do not need to rely on llm for executing deterministic workflows.
ffmpeg is the only community where I've asked for help and been told "if you have to ask, you're too stupid to use this project". Needless to say, it was a welcoming community I continued engaging with.
sanjit|3 months ago
FFmpeg has complex syntax because it’s dealing with the _complexity of video_. I agree with everyone about knowing (and helping create or contribute to) our tools.
Today I largely forget about the _legacy_ of video, the technical challenges, and how critical it was to get it right.
There are an incredible number of output formats and considerations for _current_ screens (desktop, tablet, mobile, tv, etc…). Then we have a whole other world on the creation side for capture, edit, live broadcast…
On legacy formats it used to be so complex with standards, requirements, and evolving formats. Today, we don’t even think about why we have 29.97fps around? Interlacing?
We have a mix of so many incredible (and sometimes frustrating) codecs, needs and final outputs, so it’s really amazing the power we have with a tool like FFmpeg… It’s daunting but really well thought out.
So just a big thanks to the FFmpeg team for all their incredible work over the years…
echelon|3 months ago
It's dealing with 3D data (more if you count audio or other tracks) and multi-dimensional transforms from a command line.
shardullavekar|3 months ago
charcircuit|3 months ago
It's complexity paired with bad design, making the situation worse than it could be.
hexo|3 months ago
utopiah|3 months ago
esperent|3 months ago
Sean-Der|3 months ago
`gst-launch-1.0 filesrc ! qt4demux ! matroskamux ! filesink...` people would be less frustrated maybe?
People would also learn a little more and be less frustrated when conversation about container/codec/colorspace etc... come up. Each have a dedicated element and you can better understand its I/O
jack_pp|3 months ago
artpar|3 months ago
Here making ffmpeg as "just another capability" allows it to be stitched together in workflows
shardullavekar|3 months ago
javier2|3 months ago
somat|3 months ago
https://ffmpeg.org/ffmpeg-filters.html#toc-Filtergraph-synta...
"A special syntax implemented in the ffmpeg CLI tool allows loading option values from files. This is done be prepending a slash ’/’ to the option name, then the supplied value is interpreted as a path from which the actual value is loaded."
For how critical that was to getting over my ffmpeg hump, I wish it was not buried halfway through the documentation, but also, I don't know where else it would go.
And just because I am very proud of my accomplishment here is the ffmpeg side of my project, motion detection using mainly ffmpeg, there is some python glue logic to watch stdout for the events but all the tricky bits are internal to ffmpeg.
The filter(comments are added for audience understanding):
and the ffmpeg evocation:sexyman48|3 months ago
[deleted]
jack_pp|3 months ago
Like the Before vs after section doesn't even seem to create the same thing, the before has no speedup, the after does.
In the end it seems they basically created a few services ("recipes") that they can reuse to do simple stuff like speed-up 2x or combine audio / video or whatever
shardullavekar|3 months ago
IsTom|3 months ago
-filter_complex_script is a thing
skeeter2020|3 months ago
shardullavekar|3 months ago
4gotunameagain|3 months ago
harrall|3 months ago
- For one-offs, you would just use a GUI.
- For regular edits where you want creative control, you would use a NLE GUI.
- For regular edits where you want consistency, you would have a limited GUI without access to ffmpeg options.
CLI/prompt-based editing for a visual medium is how a programmer might approach editing but no creative…
unknown|3 months ago
[deleted]
sylware|3 months ago
arjie|3 months ago
I can ask it to orient people the right way, crop to the important parts, etc. and it will figure out what "the right way", "the important parts", etc. are. Sometimes I have to give it some light hints like "extract n frames from before y to figure out things", but most of the time it just does it.
Claude Code acts like a very general purpose agent for me. About the one thing that I have to manually do that I'm annoyed by is editing 360 videos into a flow. I'd like to be able to tell Claude Code to "follow my daughter as I dunk her in the pool" and stuff like that but I have to do that myself in the GoPro editor.
qmr|3 months ago
https://youtu.be/9kaIXkImCAM
javier2|3 months ago
coachgodzup|3 months ago
orbital-decay|3 months ago
russfink|3 months ago
oldgregg|3 months ago
usrxcghghj|3 months ago
artpar|3 months ago
Dachande663|3 months ago
pinter69|3 months ago
This is a nice resource: https://amiaopensource.github.io/ffmprovisr/
And also I've written this cheatsheet, which is designed to be used alongside an LLM: https://github.com/rendi-api/ffmpeg-cheatsheet
Let me know if you're interested in more resources
kwanbix|3 months ago
It works 99% of the time for my use case.
shardullavekar|3 months ago
draw_down|3 months ago
[deleted]
officeplant|3 months ago
tartoran|3 months ago