top | item 42042706

(no title)

unlord | 1 year ago

It is actually quite a bit more misleading. I was not able to reproduce these numbers on Zen2 hardware, see https://people.videolan.org/~unlord/dav1d_6tap.png. I spoke with the slide author and he confirmed he was using an -O0 debug build of the checkasm binary.

What's more, the C code is running an 8-tap filter where the SIMD for that function (in all of SSSE3, AVX2 and AVX512) is implemented as 6-tap. Last week I posted MR !1745 (https://code.videolan.org/videolan/dav1d/-/merge_requests/17...) which adds 6-tap to the C code and brings improved performance to all platforms dav1d supports.

This, of course, also closes the gap in these numbers but is a more accurate representation of the speed-up from hand-written assembly.

discuss

zbobet2012|1 year ago

The thing I found interesting in the AVX512 gains over AVX2. That's a pretty nice gain from the wider instruction set which has often been ignored in the video community.

wscott|1 year ago

The important thing to understand about why AVX512 is a big deal is not the width. AVX512 adds new instructions and new instruction encodings. They doubled the number of registers (16->32), and added mask registers that allow you to remove special cases at the end of loops when the array is not a multiple of the vector width. And there is piles for new permutation operations and integer operations that allow it to be useful in more cases.

The part Intel struggles with is that in many places if they had the 256-bit max width but all the new operations then they could build a machine that is faster than the 512-bit version. (assuming the same code was written for both vector widths) The reason is the ALUs could be faster and you could have more of them.

Dylan16807|1 year ago

For most operations, on most CPUs, you can get the same results with twice as many AVX2 instructions. And that's excluding the CPUs with no AVX512 at all.

But the number of situations where AVX512 has a significant advantage is growing, so interest will grow alongside it.

jsheard|1 year ago

Sadly AVX512 is still very easy to ignore given how badly Intel has botched its rollout across consumer processors.

Ironically AMD has stronger AVX512 support at this point despite the spec originating at Intel.