top | item 40681306

H.264 Is Magic (2016)

333 points| tosh | 1 year ago |sidbala.com

205 comments

order
[+] peutetre|1 year ago|reply
AV1 is more magic with better licensing.

For their use case Meta is gradually getting to a baseline of VP9 and AV1 streams for their video streaming: https://www.streamingmedia.com/Producer/Articles/Editorial/F...

And AV1 for video calls: https://engineering.fb.com/2024/03/20/video-engineering/mobi...

Microsoft is starting to use AV1 in Teams. AV1 has video coding tools that are particularly useful for screen sharing: https://techcommunity.microsoft.com/t5/microsoft-teams-blog/...

Most of the video I see on YouTube these days is in VP9 or AV1. Occasionally some video is still H.264.

H.264 will still be around for quite some time, but AV1 looks set to become the new baseline for internet video.

[+] wolfspider|1 year ago|reply
Well sure, but the hardware encode and decode isn’t completely widespread yet. I’ve been patiently waiting for what has felt like an eternity. From the developer perspective everyone needs to have access to it or I’m just sitting on my hands waiting for the capabilities to trickle down to the majority of users. Hopefully more users will purchase hardware if it features AV1 encode/decode. They need a logo that says “AV1 inside” or something. So, for example only the iPhone 15 pro offers hardware decode so far in the iPhone lineup.
[+] mananaysiempre|1 year ago|reply
> AV1 is more magic with better licensing.

AV1 is bloody awful to encode in software.

I was recording videos of lectures (a slowly moving person against a whiteboard with thin sharp lines on it, in a questionably well-lit room) and needed to encode them to smallish 720p files to be posted somewhere (ended up uploading them to the Internet Archive). This is not something that x264’s default encoding profiles do well on, but with a day or two of fiddling with the settings I had something I could run on my 2014-era iGPU-only laptop the night after the lecture and have the result up the next day.

By contrast, libaom promised me something like a week of rendering time for the three hours of video. Perhaps this could be brought down (my x264 fiddling got me veryslow-comparable results several times faster), but the defaults were so bad I couldn’t afford to experiment. (This was some four years ago. Things have probably gotten better since then, but I don’t expect miracles.)

[+] MrDrMcCoy|1 year ago|reply
In an 8-bit colorspace, h.264/5 suffer from virtually no block artifacts. AV1 can't get rid of them without upping to 10-bit. Not that it's necessarily a problem.

The real problem with AV1 is how darn computationally intensive it is to compress.

[+] sambazi|1 year ago|reply
> Most of the video I see on YouTube these days is in VP9 or AV1. Occasionally some video is still H.264.

can't shake the feeling that most videos at 720p took a hard hit to quality within the last years (or my eyes are getting better)

any deep dives on the topic appreciated

[+] Dwedit|1 year ago|reply
AV1 doesn't have the fast encode/decode speed that H.264 has.
[+] userbinator|1 year ago|reply
8 years after that article was written, many of the patents on H.264 are actually approaching expiry soon (i.e. within a year or two):

https://meta.wikimedia.org/wiki/Have_the_patents_for_H.264_M...

This is not surprising, given that the first version of the H.264 standard was published in 2003 and patents are usually valid for 20 years.

Its predecessors, H.263 and MPEG-4 ASP, have already patent-expired and are in public domain.

[+] amelius|1 year ago|reply
I bet that the successor algorithms will be implemented everywhere in hardware, and we're stuck with the patent problem once again.
[+] jpm_sd|1 year ago|reply
Well OK, but what about H.265? It's one louder, isn't it?

https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding

[+] rebeccaskinner|1 year ago|reply
I do a lot of video compression for hobby projects, and I stick with h264 for the most part because h265 encoding requires far too much extra compute relative to the space savings. I can spend an hour compressing a file down to 1gb with h264, or I can spend 12 hours compressing the same file to 850mb with h265. Depending on the use-case, I might still need the h264 version anyway since it's far more widely supported by clients. If I had a data center worth of compute to throw at encoding, or I were running a streaming service where the extra 150mb per video started to add up, then I'd definitely on-board with h265 but it's really hard to justify for a lot of practical use-cases.
[+] adzm|1 year ago|reply
I really like HEVC/h265. It's pretty much on par with VP9. but licensing trouble has made it difficult to get adopted everywhere even now. VVC/h266 seems to be having the same issues; AV1 is pretty much just as good and already seeing much more adoption.
[+] cornstalks|1 year ago|reply
I've always felt like H.264 hit a great sweet spot of complexity vs compression. Newer codecs compress better, but they're increasingly complex in a nonlinear way.
[+] TacticalCoder|1 year ago|reply
> Well OK, but what about H.265? It's one louder, isn't it?

I had some old, gigantic, video footage (in a variety of old, inefficient, formats at a super high quality).

So I did some testing and, well, ended up re-encoding/transcoding everything to H.265. It makes for much smaller files than H.264. The standard is also ten years younger than H.264 (2013 for H.265 vs 2003 for H.264).

[+] ezoe|1 year ago|reply
HEVC and AV1 is twice more magic. It keep the same quality to human eyes while reducing the bitrate by 50%.

HEVC is patent encumbered. Users who use old hardware don't have hardware decoder. So both requires more time to be adopted.

[+] asveikau|1 year ago|reply
Maybe I need to change some options, but I find whatever encoder I get when I ask ffmpeg to encode h265 takes a very long time. Then decoding 4k h265 is very slow on most of my PCs.

It sure gets the file size down though.

[+] g4zj|1 year ago|reply
> Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data!

There appears to be some lossy compression on that string of "H"s.

[+] NewJazz|1 year ago|reply
Also, it's not really compression because "10 tosses, all heads" is more characters.
[+] lupusreal|1 year ago|reply
Try saying each of those though.
[+] Agingcoder|1 year ago|reply
I remember when h264 appeared.

At that time, I was obsessed with mplayer and would download and build the latest releases on a regular basis. The first time I got an h264 file, mplayer wouldn’t read it so I had to get the dev version and build it.

It worked, and I realized two things : the quality was incredible, and my athlon 1800+ couldn’t cope with it. Subsequent mplayer versions ( or libavcodec ?) vastly improved performance, but I still remember that day.

[+] cheema33|1 year ago|reply
Yep. Same here. Man, I have not used mplayer in a very long time. But, it was the bee's knees.

I used to work for a company that was developing some video based products. And this other company in Vegas sold our bosses a "revolutionary video codec" and a player. We had to sign NDAs to use it.

I used it and observed that it worked like mplayer. Too much like it. 5 more minutes of investigation and the jig was up. Our execs who paid a lot of money to the Vegas company had an egg on their face.

It is shockingly easy to scam non-tech decisions makers in the tech field. They are too worried about being left behind. Smart ones will pull in a "good" engineer for evaluation. Dunning Kruger subjects will line up with their wallets out.

[+] tasty_freeze|1 year ago|reply
Story time.

Back in 1999, I was leaving a startup that had been acquired, but I didn't want to stick around. I had been doing in mpeg encoding; this is relevant later.

One of the companies I interviewed with had come up with a new video compression scheme. They were very tight lipped but after signing NDA paperwork, they showed me some short clips they had encoded/decoded via a non-realtime software codec. I was interviewing to create an ASIC version of the algorithm. Even seeing just a minute or two of their codec output, I guessed what they were doing. I suggested that their examples were playing to the strengths of their algorithm and suggested some scenarios that would be more challenging. I also described what I thought they were doing. They neither confirmed nor denied, but they had me come back for a second round.

In the second round I talked with the founders, a husband/wife CEO/CTO (IIRC) team. That is when I learned their plan wasn't to sell ASICs, but they wanted to keep their codec secret and instead create DSL-based cable network using the ASICs for video distribution. I said something like, it sounds like you have invented a better carburetor and instead of selling carburetors you are planning on building a car factory to compete with GM. It was cheeky, and they didn't take it well.

Getting to the point of how this story relates to H.264, their claim was: conventional compression has reached the limit, and so their codec would allow them to do something nobody else can do: send high-def video over DSL lines. I replied: I think compressors will continue to get better, and even if not, higher speed internet service will eventually come to houses and remove the particular threshhold they think only they can cross. Oh, no, they replied, physics dictates the speed bits can be sent on a wire and we are at that limit.

I didn't get (and didn't want) a job offer. The company did get some VC money but folded a few years later, much more efficient codecs were developed by others, and 2 Mbps internet connections were not a limit. I'm sure the actual algorithm (which I describe later at a high level) had a lot of clever math and algorithmic muscle to make it work -- they weren't stupid technically, just stupid from a business sense.

This retelling makes me sound like a smug know-it-all, but it is one of two times in my life where I looked at something and in seconds figured out their secret sauce. There are far more examples where I am an idiot.

What was their algorithm? They never confirmed it, but based on silhouette artifacts, it seemed pretty clear what they were doing. MPEG (like jpeg) works by compressing images in small blocks (8x8, 16x16, and a few others). That limits how much spatial redundancy can be used to compress the image, but it also limits the computational costs of finding that redundancy. Their codec was similar to what Microsoft had proposed for their Talisman graphics architecture in the late 90s.

From what I could tell, they would analyze a sequence of frames and rather than segmenting the image into fixed blocks like mpeg, they would find regions with semi-arbitrary boundaries that were structurally coherent -- eg, if the scene was a tennis match, they could tell that the background was pretty "rigid" -- if a pixel appeared to move (the camera panned) then the nearby pixels were highly likely to make the same spatial transformation. Although each player changed frame to frame, that blob had some kind of correlation in lighting and position. Once identified, they would isolate a given region and compress that image using whatever (probably similar to jpeg). In subsequent frames they'd analyze the affine (or perhaps more general) transformation from a region from one frame to the next, then encode that transformation via a few parameters. That would be the basis of the next frame prediction, and if done well, not many bits need to be sent to fix up the misprediction errors.

[+] thewarpaint|1 year ago|reply
> they weren't stupid technically, just stupid from a business sense

I wouldn’t call them “stupid technically”, but assuming that there was a hard limit to codec efficiency and to internet bandwidth would at least make them technically naive in my opinion.

[+] ElFitz|1 year ago|reply
These stories are a big part of what I love in HN. Thank you.
[+] tbalsam|1 year ago|reply
This is very cool but the use of the terms "Information Entropy" together as if they were a separate thing is maybe the furthest that any "ATM-machine"-type phrase has rustled my jimmies.

It is a cool article, just, wowzers, what a phrase.

[+] kragen|1 year ago|reply
in 02016 it was patented magic in many countries. now the patents have all expired, or will in a few months, since the standard was released in august 02004 after a year of public standardization work, patents only last 20 years from filing, and you can't file a patent on something that's already public (except that, in the usa, there's a one-year grace period if you're the one that published it). if there are any exceptions, i'd be very interested to hear of them

userbinator points at https://meta.m.wikimedia.org/wiki/Have_the_patents_for_H.264..., but most of the patents there have a precedence date after the h.264 standard was finalized, and therefore can't be necessary to implement h.264 itself (unless the argument is that, at the time it was standardized, it was not known to be implementable, a rather implausible argument)

what's surprising is that the last 20 years have produced a few things that are arguably a little better, but nothing that's much better, at least according to my tests of the implementations in ffmpeg

it seems likely that its guaranteed patent-free status will entrench it as the standard codec for the foreseeable future, for better or worse. av1 has slightly better visual quality at the same bandwidth, but is much slower (possibly this is fixable by a darktangent), but it's vulnerable to patents filed by criminals as late as 02018

[+] kazinator|1 year ago|reply
> Because you've asked the decoder to jump to some arbitrary frame, the decoder has to redo all the calculations - starting from the nearest I-frames and adding up the motion vector deltas to the frame you're on - and this is computationally expensive, and hence the brief pause.

That was 2016. Today, it's because Youtube knows you're on Firefox.

[+] bob1029|1 year ago|reply
My favorite part of these codecs is how the discrete cosine transform plays so well together with quantization, zig-zag scanning and entropy coding.

The core of any lossy visual compressor is approximately the same theme. Those I-frames are essentially jpegs.

[+] IvanK_net|1 year ago|reply
Lossy compression is possible in PNG to a certain degree. This is the "same PNG", but 94 kB instead of 1015 kB:

1015 kB: https://sidbala.com/content/images/2016/11/outputFrame.png

94 kB: https://www.photopea.com/keynote.png

[+] SigmundA|1 year ago|reply
Thats not PNG being lossy, the image was preprocessed to lose information by reducing color count before being compressed with PNG.
[+] simonebrunozzi|1 year ago|reply
V-Nova has developed a very interesting video compression algorithm, now in use at some major media companies [0]. It is software but it will also be built-in in certain hardware for Smart TVs, etc.

Disclosure: I'm an investor.

[0]: https://www.v-nova.com/

[+] tomjen3|1 year ago|reply
One of my surprises when I started to investigate video encoders was that, because the details in a scene comes at the high frequency, you can drop the bandwith requirements on the fly, by just not sending the details for a frame.
[+] Rakshith|1 year ago|reply
I think VVC is even better, MXPlayer streaming already has shows streaming in VVC in India! I dont know how they're doing it but its live. VVC is supposedly 20-30% more efficient than av1
[+] samlinnfer|1 year ago|reply
To nitpick, comparing with PNG is misleading, because it's comparing a lossless format with lossy. a JPEG would be around the same size has H.264.
[+] eviks|1 year ago|reply
it's not misleading, but leading - to the concept of lossy compression
[+] bigstrat2003|1 year ago|reply
Yeah that jumped out at me as well. It's a really unfair comparison.
[+] P_I_Staker|1 year ago|reply
If it's magic, it's spells have been greatly disenchanted. It's like gandolf.