top | item 5916308

Next Generation Video: Introducing Daala

215 points| metajack | 12 years ago |people.xiph.org

34 comments

order
[+] gioele|12 years ago|reply
Monty's ability to explain convoluted technical topics is astounding. He explains everything in plain words yet it remains technical and scientific enough.
[+] rikacomet|12 years ago|reply
indeed, I never knew this stuff.. and since CS is not my field, I might have never understood this.
[+] znowi|12 years ago|reply
I'm surprised to learn that modern codecs are so conservative, incrementing on the tech invented decades ago. I expected this field to be rampant with cutting-edge research techniques. As the article says, the lapping transformation dates back to the early 90s. And it's considered "the next-next-generation" today. Why is it happening?
[+] ZeroGravitas|12 years ago|reply
Well if a government-blessed cartel ran industry X, and there was basically only one product with no competition, which was replaced on a predetermined 10 year cycle after a series of long political committee meetings and they'd built a massive patent thicket to prevent upstart competition and control rivals business models how much disruptive innovation would you expect to see?
[+] wolfgke|12 years ago|reply
That's what (software) patents are for (and video codecs are a heavily patent-encumbered field) - slowing down progress because of patent minefields you have to pay attention to when implementing a codec. It can easily happen that you missed a patent or can't circumvent it - then you have to pay protection money...

A perfect ground for innovation...

[+] antninja|12 years ago|reply
I remember the BBC's Dirac codec uses wavelets. i don't know how it performs today, no one has adopted it.
[+] rikacomet|12 years ago|reply
all modern high tech things are somehow unique implementation of different basic techniques.

A russian hook, for example was considered ultra modern in martial arts at some time, but in principle it used the same old muscles, with some basic steps, but new implementation.

[+] jwr|12 years ago|reply
This is very good news. I'm glad to finally see some real development of video encoding. I was disappointed to see Google put its weight behind VP9, which was basically the same set of technologies as H.264, with (almost) the same set of patent encumbrances.

The ideas behind Daala, while not revolutionary, are enough to make it quite different from everything else out there. I also hope it means it won't infringe on every patent out there, just some of them.

[+] ZeroGravitas|12 years ago|reply
As the article points out, the traditional codec development is based on video codec developments going back 25 years (and fundamental maths going back much further) so it's not the basics that are patented. Instead it's fairly idiosyncratic design choices because that's the MVP (minimum viable patent) that you need to insert into the spec to get a share of the royalties.

In short, the patent stuff has always been mostly threats from the incumbents, which are not empty but don't actually rely on any of the underlying patents being sound. Google gets a lot of abuse from the peanut gallery for not nuking H.264 support in Chrome but they seem to be doing amazing work behind the scenes to get the incumbents on board with free codecs. Not ever going to earn you as much geek cred as rebooting video codec development down a different (and quite possibly better) path, but still a worthy endeavour in my mind.

[+] gillianseed|12 years ago|reply
Well, Google bought On2, which had been developing video codecs for a long time prior to Google's purchase, and VP8/VP9 is what has resulted from continued development of On2's codec technology.

Very much looking forward to what Daala can bring, hopefully the different approach it goes for can bring competitive quality.

[+] arianvanp|12 years ago|reply
Speaking of 'original' formats. Why would one prefer the Discrete Cosine Transform to the Discrete Fourier Transform? I've only worked with the Fourier, and I'm wondering if there are any benefits?
[+] 0x09|12 years ago|reply
Since the DCT has an implicit even extent as opposed to the DFT which is strictly periodic it's much more resistant to removal of high frequencies. Most signals (especially image blocks) aren't periodic, so the DFT uses a lot of energy to represent the discontinuity at the edge of the signal. The DCT avoids this discontinuity by implicitly mirroring the signal at the edges. This is what is meant when the DCT is said to do better energy compaction.

  signal    DFT        DCT II  v-nice and smooth, easy for cos() to fit
    \_      \_ \_ \_         _/\_  _/
      \       \  \  \       /    \/  
               ^sharp, high-frequency discontinuity
                this will make a large value at the top of our spectrum
The other reason is that pixel data has no imaginary component, which obviates half the DFT's spectrum, with the DFT requiring complex math regardless. The DCT is purely real-valued so it's a matter of one real output per one real input.

Since the DCT is just an even and real-valued special case of the DFT, those two make up the whole difference.

[+] 0x006A|12 years ago|reply
One property of the DCT that makes it quite suitable for compression is its high degree of "spectral compaction;" at a qualitative level, a signal's DCT representation tends to have more of its energy concentrated in a small number of coefficients when compared to other transforms like the DFT. This is desirable for a compression algorithm.
[+] nullc|12 years ago|reply
I still think we should have pointed out that the same thing that makes feistel ciphers invertable is what makes integer lifting invertable.
[+] mtgx|12 years ago|reply
They make the distinction on their page, too - this is a "next-next-generation" codec, not just a "next-generation" one like VP9 and HEVC.

So if this is finalized in 2-3 years, then it will be more of a competitor to VP10. Not sure if MPEG-LA will release another one 2 years from now. They usually release one every 5 years or more, and VP9 managed to catch-up with HEVC after only 2 years of work (work on HEVC started in 2008; work on VP9 in 2011), so we might see VP10 in 2 years that is twice as good as VP9/HEVC, but not an h.266 codec that is twice as good HEVC/h.265.

It will be interesting to see just how good Daala will be. If it's going to be released 2-3 years from now, then it should be at least 3x better than h.264, or at least 4x better to be safe, and to be worth the switch from HEVC/VP9 (and at least as good as VP10, if Google does indeed release VP10 around then, too). That would make 4k video as efficient as 1080p video with h.264 (file size/bandwidth-wise).

[+] anon1385|12 years ago|reply
>not just a "next-generation" one like VP9 and HEVC.

VP9 is not the same generation as HEVC. It's more comparable with H.264.

>VP9 managed to catch-up with HEVC after only 2 years of work

Do you mean in terms of how far along with standardisation/development they are? VP9 is ~6 months behind, although that's not really that significant in video standard timescales. If you mean quality then VP9 encoders have not caught up with x264, and that in itself may take years, if they ever do.

We were promised that VP8 would get a lot better over time, but it didn't because they basically gave up and concentrated on VP9. If they do the same again and start looking to VP10 then VP9 will flounder. Video codecs can't gain adoption with such short churn times. The reason there was so little time between VP8 and VP9 is that VP8 was technically so far behind it never had any chance of adoption.

MPEG-LA are not going to release a new codec in 2 years time. HEVC will probably only just be seeing mature implementations and widespread adoption by then. There is still a hell of a lot of MPEG-2 tech out there and H.264 has been out for a decade.

[+] 0x09|12 years ago|reply
> Not sure if MPEG-LA will release another one 2 years from now.

MPEG-LA aren't compression engineers or a standards body. They're a law firm in Colorado. The LA stands for "Licensing Authority." Thanks to the blog roll half the internet can't distinguish between it and ISO WG-11, but it's not a trifling distinction.

[+] rikacomet|12 years ago|reply
so basically how much it would save : Cost/Time/Space over a 100 MB file, compared to current codec ? A example would have been nice for non-CS people like me.
[+] VikingCoder|12 years ago|reply
We're at the research phase, still. The techniques seem promising, but there's still a lot of work to do, before we get to the tuning stage. THEN, you can have your estimates.