Well sure, but the hardware encode and decode isn’t completely widespread yet. I’ve been patiently waiting for what has felt like an eternity. From the developer perspective everyone needs to have access to it or I’m just sitting on my hands waiting for the capabilities to trickle down to the majority of users. Hopefully more users will purchase hardware if it features AV1 encode/decode. They need a logo that says “AV1 inside” or something. So, for example only the iPhone 15 pro offers hardware decode so far in the iPhone lineup.
I was recording videos of lectures (a slowly moving person against a whiteboard with thin sharp lines on it, in a questionably well-lit room) and needed to encode them to smallish 720p files to be posted somewhere (ended up uploading them to the Internet Archive). This is not something that x264’s default encoding profiles do well on, but with a day or two of fiddling with the settings I had something I could run on my 2014-era iGPU-only laptop the night after the lecture and have the result up the next day.
By contrast, libaom promised me something like a week of rendering time for the three hours of video. Perhaps this could be brought down (my x264 fiddling got me veryslow-comparable results several times faster), but the defaults were so bad I couldn’t afford to experiment. (This was some four years ago. Things have probably gotten better since then, but I don’t expect miracles.)
In an 8-bit colorspace, h.264/5 suffer from virtually no block artifacts. AV1 can't get rid of them without upping to 10-bit. Not that it's necessarily a problem.
The real problem with AV1 is how darn computationally intensive it is to compress.
I do a lot of video compression for hobby projects, and I stick with h264 for the most part because h265 encoding requires far too much extra compute relative to the space savings. I can spend an hour compressing a file down to 1gb with h264, or I can spend 12 hours compressing the same file to 850mb with h265. Depending on the use-case, I might still need the h264 version anyway since it's far more widely supported by clients. If I had a data center worth of compute to throw at encoding, or I were running a streaming service where the extra 150mb per video started to add up, then I'd definitely on-board with h265 but it's really hard to justify for a lot of practical use-cases.
I really like HEVC/h265. It's pretty much on par with VP9. but licensing trouble has made it difficult to get adopted everywhere even now. VVC/h266 seems to be having the same issues; AV1 is pretty much just as good and already seeing much more adoption.
I've always felt like H.264 hit a great sweet spot of complexity vs compression. Newer codecs compress better, but they're increasingly complex in a nonlinear way.
> Well OK, but what about H.265? It's one louder, isn't it?
I had some old, gigantic, video footage (in a variety of old, inefficient, formats at a super high quality).
So I did some testing and, well, ended up re-encoding/transcoding everything to H.265. It makes for much smaller files than H.264. The standard is also ten years younger than H.264 (2013 for H.265 vs 2003 for H.264).
Maybe I need to change some options, but I find whatever encoder I get when I ask ffmpeg to encode h265 takes a very long time. Then decoding 4k h265 is very slow on most of my PCs.
> Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data!
There appears to be some lossy compression on that string of "H"s.
At that time, I was obsessed with mplayer and would download and build the latest releases on a regular basis. The first time I got an h264 file, mplayer wouldn’t read it so I had to get the dev version and build it.
It worked, and I realized two things : the quality was incredible, and my athlon 1800+ couldn’t cope with it. Subsequent mplayer versions ( or libavcodec ?) vastly improved performance, but I still remember that day.
Yep. Same here. Man, I have not used mplayer in a very long time. But, it was the bee's knees.
I used to work for a company that was developing some video based products. And this other company in Vegas sold our bosses a "revolutionary video codec" and a player. We had to sign NDAs to use it.
I used it and observed that it worked like mplayer. Too much like it. 5 more minutes of investigation and the jig was up. Our execs who paid a lot of money to the Vegas company had an egg on their face.
It is shockingly easy to scam non-tech decisions makers in the tech field. They are too worried about being left behind. Smart ones will pull in a "good" engineer for evaluation. Dunning Kruger subjects will line up with their wallets out.
Back in 1999, I was leaving a startup that had been acquired, but I didn't want to stick around. I had been doing in mpeg encoding; this is relevant later.
One of the companies I interviewed with had come up with a new video compression scheme. They were very tight lipped but after signing NDA paperwork, they showed me some short clips they had encoded/decoded via a non-realtime software codec. I was interviewing to create an ASIC version of the algorithm. Even seeing just a minute or two of their codec output, I guessed what they were doing. I suggested that their examples were playing to the strengths of their algorithm and suggested some scenarios that would be more challenging. I also described what I thought they were doing. They neither confirmed nor denied, but they had me come back for a second round.
In the second round I talked with the founders, a husband/wife CEO/CTO (IIRC) team. That is when I learned their plan wasn't to sell ASICs, but they wanted to keep their codec secret and instead create DSL-based cable network using the ASICs for video distribution. I said something like, it sounds like you have invented a better carburetor and instead of selling carburetors you are planning on building a car factory to compete with GM. It was cheeky, and they didn't take it well.
Getting to the point of how this story relates to H.264, their claim was: conventional compression has reached the limit, and so their codec would allow them to do something nobody else can do: send high-def video over DSL lines. I replied: I think compressors will continue to get better, and even if not, higher speed internet service will eventually come to houses and remove the particular threshhold they think only they can cross. Oh, no, they replied, physics dictates the speed bits can be sent on a wire and we are at that limit.
I didn't get (and didn't want) a job offer. The company did get some VC money but folded a few years later, much more efficient codecs were developed by others, and 2 Mbps internet connections were not a limit. I'm sure the actual algorithm (which I describe later at a high level) had a lot of clever math and algorithmic muscle to make it work -- they weren't stupid technically, just stupid from a business sense.
This retelling makes me sound like a smug know-it-all, but it is one of two times in my life where I looked at something and in seconds figured out their secret sauce. There are far more examples where I am an idiot.
What was their algorithm? They never confirmed it, but based on silhouette artifacts, it seemed pretty clear what they were doing. MPEG (like jpeg) works by compressing images in small blocks (8x8, 16x16, and a few others). That limits how much spatial redundancy can be used to compress the image, but it also limits the computational costs of finding that redundancy. Their codec was similar to what Microsoft had proposed for their Talisman graphics architecture in the late 90s.
From what I could tell, they would analyze a sequence of frames and rather than segmenting the image into fixed blocks like mpeg, they would find regions with semi-arbitrary boundaries that were structurally coherent -- eg, if the scene was a tennis match, they could tell that the background was pretty "rigid" -- if a pixel appeared to move (the camera panned) then the nearby pixels were highly likely to make the same spatial transformation. Although each player changed frame to frame, that blob had some kind of correlation in lighting and position. Once identified, they would isolate a given region and compress that image using whatever (probably similar to jpeg). In subsequent frames they'd analyze the affine (or perhaps more general) transformation from a region from one frame to the next, then encode that transformation via a few parameters. That would be the basis of the next frame prediction, and if done well, not many bits need to be sent to fix up the misprediction errors.
> they weren't stupid technically, just stupid from a business sense
I wouldn’t call them “stupid technically”, but assuming that there was a hard limit to codec efficiency and to internet bandwidth would at least make them technically naive in my opinion.
This is very cool but the use of the terms "Information Entropy" together as if they were a separate thing is maybe the furthest that any "ATM-machine"-type phrase has rustled my jimmies.
It is a cool article, just, wowzers, what a phrase.
in 02016 it was patented magic in many countries. now the patents have all expired, or will in a few months, since the standard was released in august 02004 after a year of public standardization work, patents only last 20 years from filing, and you can't file a patent on something that's already public (except that, in the usa, there's a one-year grace period if you're the one that published it). if there are any exceptions, i'd be very interested to hear of them
userbinator points at https://meta.m.wikimedia.org/wiki/Have_the_patents_for_H.264..., but most of the patents there have a precedence date after the h.264 standard was finalized, and therefore can't be necessary to implement h.264 itself (unless the argument is that, at the time it was standardized, it was not known to be implementable, a rather implausible argument)
what's surprising is that the last 20 years have produced a few things that are arguably a little better, but nothing that's much better, at least according to my tests of the implementations in ffmpeg
it seems likely that its guaranteed patent-free status will entrench it as the standard codec for the foreseeable future, for better or worse. av1 has slightly better visual quality at the same bandwidth, but is much slower (possibly this is fixable by a darktangent), but it's vulnerable to patents filed by criminals as late as 02018
> Because you've asked the decoder to jump to some arbitrary frame, the decoder has to redo all the calculations - starting from the nearest I-frames and adding up the motion vector deltas to the frame you're on - and this is computationally expensive, and hence the brief pause.
That was 2016. Today, it's because Youtube knows you're on Firefox.
V-Nova has developed a very interesting video compression algorithm, now in use at some major media companies [0]. It is software but it will also be built-in in certain hardware for Smart TVs, etc.
One of my surprises when I started to investigate video encoders was that, because the details in a scene comes at the high frequency, you can drop the bandwith requirements on the fly, by just not sending the details for a frame.
I think VVC is even better, MXPlayer streaming already has shows streaming in VVC in India! I dont know how they're doing it but its live. VVC is supposedly 20-30% more efficient than av1
[+] [-] peutetre|1 year ago|reply
For their use case Meta is gradually getting to a baseline of VP9 and AV1 streams for their video streaming: https://www.streamingmedia.com/Producer/Articles/Editorial/F...
And AV1 for video calls: https://engineering.fb.com/2024/03/20/video-engineering/mobi...
Microsoft is starting to use AV1 in Teams. AV1 has video coding tools that are particularly useful for screen sharing: https://techcommunity.microsoft.com/t5/microsoft-teams-blog/...
Most of the video I see on YouTube these days is in VP9 or AV1. Occasionally some video is still H.264.
H.264 will still be around for quite some time, but AV1 looks set to become the new baseline for internet video.
[+] [-] wolfspider|1 year ago|reply
[+] [-] mananaysiempre|1 year ago|reply
AV1 is bloody awful to encode in software.
I was recording videos of lectures (a slowly moving person against a whiteboard with thin sharp lines on it, in a questionably well-lit room) and needed to encode them to smallish 720p files to be posted somewhere (ended up uploading them to the Internet Archive). This is not something that x264’s default encoding profiles do well on, but with a day or two of fiddling with the settings I had something I could run on my 2014-era iGPU-only laptop the night after the lecture and have the result up the next day.
By contrast, libaom promised me something like a week of rendering time for the three hours of video. Perhaps this could be brought down (my x264 fiddling got me veryslow-comparable results several times faster), but the defaults were so bad I couldn’t afford to experiment. (This was some four years ago. Things have probably gotten better since then, but I don’t expect miracles.)
[+] [-] MrDrMcCoy|1 year ago|reply
The real problem with AV1 is how darn computationally intensive it is to compress.
[+] [-] sambazi|1 year ago|reply
can't shake the feeling that most videos at 720p took a hard hit to quality within the last years (or my eyes are getting better)
any deep dives on the topic appreciated
[+] [-] Dwedit|1 year ago|reply
[+] [-] dang|1 year ago|reply
H.264 is Magic (2016) - https://news.ycombinator.com/item?id=30710574 - March 2022 (219 comments)
H.264 is magic (2016) - https://news.ycombinator.com/item?id=19997813 - May 2019 (180 comments)
H.264 is Magic – a technical walkthrough - https://news.ycombinator.com/item?id=17101627 - May 2018 (1 comment)
H.264 is Magic - https://news.ycombinator.com/item?id=12871403 - Nov 2016 (219 comments)
[+] [-] userbinator|1 year ago|reply
https://meta.wikimedia.org/wiki/Have_the_patents_for_H.264_M...
This is not surprising, given that the first version of the H.264 standard was published in 2003 and patents are usually valid for 20 years.
Its predecessors, H.263 and MPEG-4 ASP, have already patent-expired and are in public domain.
[+] [-] amelius|1 year ago|reply
[+] [-] jpm_sd|1 year ago|reply
https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding
[+] [-] rebeccaskinner|1 year ago|reply
[+] [-] adzm|1 year ago|reply
[+] [-] cornstalks|1 year ago|reply
[+] [-] TacticalCoder|1 year ago|reply
I had some old, gigantic, video footage (in a variety of old, inefficient, formats at a super high quality).
So I did some testing and, well, ended up re-encoding/transcoding everything to H.265. It makes for much smaller files than H.264. The standard is also ten years younger than H.264 (2013 for H.265 vs 2003 for H.264).
[+] [-] SirMaster|1 year ago|reply
https://en.wikipedia.org/wiki/Versatile_Video_Coding
[+] [-] ezoe|1 year ago|reply
HEVC is patent encumbered. Users who use old hardware don't have hardware decoder. So both requires more time to be adopted.
[+] [-] asveikau|1 year ago|reply
It sure gets the file size down though.
[+] [-] g4zj|1 year ago|reply
There appears to be some lossy compression on that string of "H"s.
[+] [-] NewJazz|1 year ago|reply
[+] [-] lupusreal|1 year ago|reply
[+] [-] Agingcoder|1 year ago|reply
At that time, I was obsessed with mplayer and would download and build the latest releases on a regular basis. The first time I got an h264 file, mplayer wouldn’t read it so I had to get the dev version and build it.
It worked, and I realized two things : the quality was incredible, and my athlon 1800+ couldn’t cope with it. Subsequent mplayer versions ( or libavcodec ?) vastly improved performance, but I still remember that day.
[+] [-] cheema33|1 year ago|reply
I used to work for a company that was developing some video based products. And this other company in Vegas sold our bosses a "revolutionary video codec" and a player. We had to sign NDAs to use it.
I used it and observed that it worked like mplayer. Too much like it. 5 more minutes of investigation and the jig was up. Our execs who paid a lot of money to the Vegas company had an egg on their face.
It is shockingly easy to scam non-tech decisions makers in the tech field. They are too worried about being left behind. Smart ones will pull in a "good" engineer for evaluation. Dunning Kruger subjects will line up with their wallets out.
[+] [-] tasty_freeze|1 year ago|reply
Back in 1999, I was leaving a startup that had been acquired, but I didn't want to stick around. I had been doing in mpeg encoding; this is relevant later.
One of the companies I interviewed with had come up with a new video compression scheme. They were very tight lipped but after signing NDA paperwork, they showed me some short clips they had encoded/decoded via a non-realtime software codec. I was interviewing to create an ASIC version of the algorithm. Even seeing just a minute or two of their codec output, I guessed what they were doing. I suggested that their examples were playing to the strengths of their algorithm and suggested some scenarios that would be more challenging. I also described what I thought they were doing. They neither confirmed nor denied, but they had me come back for a second round.
In the second round I talked with the founders, a husband/wife CEO/CTO (IIRC) team. That is when I learned their plan wasn't to sell ASICs, but they wanted to keep their codec secret and instead create DSL-based cable network using the ASICs for video distribution. I said something like, it sounds like you have invented a better carburetor and instead of selling carburetors you are planning on building a car factory to compete with GM. It was cheeky, and they didn't take it well.
Getting to the point of how this story relates to H.264, their claim was: conventional compression has reached the limit, and so their codec would allow them to do something nobody else can do: send high-def video over DSL lines. I replied: I think compressors will continue to get better, and even if not, higher speed internet service will eventually come to houses and remove the particular threshhold they think only they can cross. Oh, no, they replied, physics dictates the speed bits can be sent on a wire and we are at that limit.
I didn't get (and didn't want) a job offer. The company did get some VC money but folded a few years later, much more efficient codecs were developed by others, and 2 Mbps internet connections were not a limit. I'm sure the actual algorithm (which I describe later at a high level) had a lot of clever math and algorithmic muscle to make it work -- they weren't stupid technically, just stupid from a business sense.
This retelling makes me sound like a smug know-it-all, but it is one of two times in my life where I looked at something and in seconds figured out their secret sauce. There are far more examples where I am an idiot.
What was their algorithm? They never confirmed it, but based on silhouette artifacts, it seemed pretty clear what they were doing. MPEG (like jpeg) works by compressing images in small blocks (8x8, 16x16, and a few others). That limits how much spatial redundancy can be used to compress the image, but it also limits the computational costs of finding that redundancy. Their codec was similar to what Microsoft had proposed for their Talisman graphics architecture in the late 90s.
From what I could tell, they would analyze a sequence of frames and rather than segmenting the image into fixed blocks like mpeg, they would find regions with semi-arbitrary boundaries that were structurally coherent -- eg, if the scene was a tennis match, they could tell that the background was pretty "rigid" -- if a pixel appeared to move (the camera panned) then the nearby pixels were highly likely to make the same spatial transformation. Although each player changed frame to frame, that blob had some kind of correlation in lighting and position. Once identified, they would isolate a given region and compress that image using whatever (probably similar to jpeg). In subsequent frames they'd analyze the affine (or perhaps more general) transformation from a region from one frame to the next, then encode that transformation via a few parameters. That would be the basis of the next frame prediction, and if done well, not many bits need to be sent to fix up the misprediction errors.
[+] [-] tasty_freeze|1 year ago|reply
https://www.zdnet.com/article/high-hopes-for-video-compressi...
It says they raised $32M from VCs, came out of stealth mode in 2002. I was unable to find what happened after that.
[+] [-] thewarpaint|1 year ago|reply
I wouldn’t call them “stupid technically”, but assuming that there was a hard limit to codec efficiency and to internet bandwidth would at least make them technically naive in my opinion.
[+] [-] ElFitz|1 year ago|reply
[+] [-] zabeltech|1 year ago|reply
[+] [-] tbalsam|1 year ago|reply
It is a cool article, just, wowzers, what a phrase.
[+] [-] kragen|1 year ago|reply
userbinator points at https://meta.m.wikimedia.org/wiki/Have_the_patents_for_H.264..., but most of the patents there have a precedence date after the h.264 standard was finalized, and therefore can't be necessary to implement h.264 itself (unless the argument is that, at the time it was standardized, it was not known to be implementable, a rather implausible argument)
what's surprising is that the last 20 years have produced a few things that are arguably a little better, but nothing that's much better, at least according to my tests of the implementations in ffmpeg
it seems likely that its guaranteed patent-free status will entrench it as the standard codec for the foreseeable future, for better or worse. av1 has slightly better visual quality at the same bandwidth, but is much slower (possibly this is fixable by a darktangent), but it's vulnerable to patents filed by criminals as late as 02018
[+] [-] kazinator|1 year ago|reply
That was 2016. Today, it's because Youtube knows you're on Firefox.
[+] [-] bob1029|1 year ago|reply
The core of any lossy visual compressor is approximately the same theme. Those I-frames are essentially jpegs.
[+] [-] IvanK_net|1 year ago|reply
1015 kB: https://sidbala.com/content/images/2016/11/outputFrame.png
94 kB: https://www.photopea.com/keynote.png
[+] [-] isametry|1 year ago|reply
[+] [-] SigmundA|1 year ago|reply
[+] [-] simonebrunozzi|1 year ago|reply
Disclosure: I'm an investor.
[0]: https://www.v-nova.com/
[+] [-] tomjen3|1 year ago|reply
[+] [-] Rakshith|1 year ago|reply
[+] [-] samlinnfer|1 year ago|reply
[+] [-] IvanK_net|1 year ago|reply
1015 kB: https://sidbala.com/content/images/2016/11/outputFrame.png
94 kB: https://www.photopea.com/keynote.png
[+] [-] eviks|1 year ago|reply
[+] [-] bigstrat2003|1 year ago|reply
[+] [-] P_I_Staker|1 year ago|reply
[+] [-] ngcc_hk|1 year ago|reply
[+] [-] ant6n|1 year ago|reply