top | item 20037019

Why Ice Lake is Important – a bit-basher’s perspective

112 points| matt_d | 6 years ago |branchfree.org | reply

41 comments

[+] glangdale|6 years ago|reply

Author here in case anyone wants detail, to throw rocks, mock my limited understanding of Galois Field arithmetic, etc.

[+] lukego|6 years ago|reply

I really enjoy this summary of the AVX512 instruction set evolution. It makes me curious to know more about the AVX512 microarchitecture evolution too. How practical is AVX512 likely to be over the coming years? How much do we have to worry about side-effects like throttling the CPU clock speed down when the wide datapaths are exercised, etc?

Can someone wake me up when it is a no-brainer to use AVX512 instructions on CPUs that support them? :-)

[+] pbsd|6 years ago|reply

I would point out that there _are_ consumer-grade chips with AVX-512 beyond the Xeons and Cannonlake: the Skylake-X chips such as i7-7800X, which even happen to have 2 512-bit FMA units, unlike some of the cheaper Xeons.

I will point out three things that make the Sunny Cove exciting to me, beyond the obvious new instructions (based on the available slides):

- There are now two ports for vector shuffles. For shuffle-heavy applications, which is often the case with bit-manipulation kernels, this is great news. This seems improved from Cannonlake.

- This was already present on the Cannonlake, but integer division is drastically improved, and goes from ~30ish uops to 4. Divisions that could take up to 90 cycles will now take <=18.

- There are now 4 LEA ports, up from 2, which for address calculation and small integer multiplications are quite useful.

[+] renaudg|6 years ago|reply

What is the state of compiler support for these very advanced instruction sets ? Can the average developer benefit by basically adding a few compiler flags ?

Also, each new Intel platform seems to bring additional instructions but most software isn't made available in a wide range of microarchitecture-specific builds. Is there typically capability detection going on behind the scenes ?

Some of the operations described here seem so specific, that I have a hard time imagining compilers being able to spot the relevant patterns in source code that can make use of them (then again, I'm not a specialist). I guess these are explicitly coded for in Assembly ?

[+] RcouF1uZ4gsC|6 years ago|reply

Thanks for writeup. Do you see AVX512 or it's successors being able to rival GPUs as the linear algebra champs in the near future?

[+] vkaku|6 years ago|reply

At this moment, as a consumer, I'm more concerned about Spectre mitigation impact, power draw and cost - among other things.

And currently, AMD is definitely winning there, and it may be my processor of choice for the next few years, until Intel fixes shortcomings in all those areas. 10nm is a step in the right directions, but the price/performance ratio is nowhere close to the mark.

But that's just my take on that.

[+] chithanh|6 years ago|reply

Something which I found odd in the article:

In the beginning you write that "VBMI [...] is the only extension that we’ve seen before – it’s in Cannonlake." but later you write that "VPOPCNTDQ is older (from the MIC product line)"

So which is it? Or am I misunderstanding something?

[+] btown|6 years ago|reply

I wonder if this will lead to significant speed ups in databases like ElasticSearch, where operations on large bitmaps (each pixel representing a term’s presence in a document) are commonplace. Do you have any insight into this?

[+] dmbaggett|6 years ago|reply

You mentioned there are NUCs that have AVX-512, which surprised me. Those looking for such a NUC: search for NUCs with the Core i3-8121U CPU. This is a cheap AVX-512 entry point if you want to experiment. (~$400 for the NUC).

[+] klyrs|6 years ago|reply

Fwiw bit bashing is great for fields of characteristic 2 and 3

[+] rb808|6 years ago|reply

> I will also admit that I’m not very interested in deep learning, floating point or crypto

I feel like I met that one other guy in the world.

[+] zeus_hammer|6 years ago|reply

There are dozens of us ... dozens!

[+] yifanl|6 years ago|reply

>>floating point

One of these is not like the others

[+] Narishma|6 years ago|reply

For anyone else confused by the title, this is about the upcoming Intel CPU generation.

[+] FullyFunctional|6 years ago|reply

GF2P8AFFINEQB sounds a lot like MXOR from Donald Knuth's MMIX (19 years prior). Now x86 just need MOR as well. EDIT: This link may be helpful: https://math.stackexchange.com/questions/1107839/how-is-d-kn...

[+] glangdale|6 years ago|reply

Thank you - this led to some interesting background. I could definitely have used MOR during bitwise regex (Glushkov NFA) implementation.

[+] bitL|6 years ago|reply

Do we know if Zen 2 is going to support AVX-512? Even if in 2x256-bit fashion... That could boost adoption as many enthusiasts will be changing platform soon.

[+] sandeatr|6 years ago|reply

It does not :(

[+] sandeatr|6 years ago|reply

I hope they release more information soon, like latency and throughput for all these avx512 instructions on ice lake.

No 2nd FMA really sucks, hope they add it when they do desktop-

I can't tell from the micro architecture slide if the yellow box labeled "ALU" that is found on port 0 and 5 refers to only integer ops, or if that includes float(add/mul).

[+] breadandcrumbel|6 years ago|reply

Food for the brain. I never thought about it before tbh