top | item 36398147

(no title)

mattst88 | 2 years ago

I am a happy owner of a Tigerlake (Intel 11th Gen) Framework laptop. I've considered upgrading to a 12th or 13th Gen motherboard, and while I have no doubt they'd be great for me as a Gentoo developer with the greatly increased core counts, my hesitation is that the new CPUs have AVX-512 disabled.

Maybe this doesn't matter, almost certainly wouldn't for most people, but I'm compiling the whole system myself so the compiler at least has the freedom to use AVX-512 wherever it pleases. Does anyone know if AVX-512 actually makes a difference in workloads that aren't specifically tuned for it?

My guess is that given news like https://www.phoronix.com/news/GCC-AVX-512-Fully-Masked-Vecto... that compilers basically don't do anything interesting with AVX-512 without hand-written code.

discuss

mtklein|2 years ago

The promise of the AVX-512 instruction set really was that it would be much easier to (auto-)vectorize code that wasn’t written with vectorization in mind, with tools like masked execution and gather/scatter that either didn’t exist at all before (SSE) or were very minimal (AVX).

The tools are there in the instruction set, but that still leaves the issues of time and effort to implement in compilers, and enough performance improvement on enough machines in some market (browsers, games, etc) capable of running it all before any of this possibility becomes real.

The skylake-xeon/icelake false start here really can’t have helped. It’s still a much more pragmatic thing to target the haswell feature set that all the intel chips and most amd chips can run (and run well).

johnklos|2 years ago

Funny that if you want AVX-512 now, it's AMD that's offering it and Intel that isn't.

Sometimes the second comer to a game has the advantage of taking their time to implement something, with fewer compromises and a better overall fit.

jeffbee|2 years ago

The compiler will only choose to use AVX-512 if you give it the right `-m` flags. Most people who are running generic distros that target the basic k8 instructions benefit from AVX-512 only when some library has runtime dispatch that detects the presence of the feature and enables optimized routines. This is common in, for example, cryptography libraries.

mattst88|2 years ago

Right. Since I'm using Gentoo and compiling my whole system with `-march=tigerlake`, the compiler is free to use AVX-512.

My question is just... does it? (And does it use AVX-512 profitably?)

PragmaticPulp|2 years ago

> I've considered upgrading to a 12th or 13th Gen motherboard, and while I have no doubt they'd be great for me as a Gentoo developer with the greatly increased core counts, my hesitation is that the new CPUs have AVX-512 disabled.

Unless you have a very specific AVX-512 workload or you need to run AVX-512 code for local testing, you won’t see any net benefit of keeping your older AVX-512 part.

Newer parts will have higher clock speed and better performance that will benefit you everywhere. Skipping that for the possibility of maybe having some workload in the future where AVX-512 might help is a net loss.

adrian_b|2 years ago

Now you may choose a new AMD Phoenix-based laptop, with great AVX-512 support (e.g. with Ryzen 7 7840HS or Ryzen 9 7940HS or Ryzen 7 7840U).

AMD Phoenix is far better than any current Intel mobile CPU anyway, so it is an easy choice (and it compiles code much faster than Intel Raptor Lake, which counts for a Gentoo user or developer).

The only reason to not choose an AMD Phoenix for an upgrade would be to wait for an Intel Meteor Lake a.k.a. Intel Core Ultra. Meteor Lake will be faster in single-thread (the relative performance in multi-thread is unknown) and it will have a bigger GPU (with 1024 FP32 ALUs vs. 768 for AMD).

However, Meteor Lake will not have AVX-512 support.

For compiling code, the AVX-512 support should not matter, but it should matter a lot for the code generated by the compiler, as it enables the efficient auto-vectorization of many loops that cannot be vectorized efficiently with AVX2.

While gcc and clang will never be as smart as hand-written code, their automatic use of AVX-512 can be improved a lot and announcements like that linked by you show progress in this direction.

causality0|2 years ago

Does anyone know if AVX-512 actually makes a difference in workloads that aren't specifically tuned for it?

I know game console emulators use it to great effect with significant performance increases.

jsheard|2 years ago

Incidentally that's another case where the 512bit-ness is the least interesting part, the new instructions are useful for efficiently emulating ARM NEON (Switch) and Cell SPU (Playstation 3) code but those platforms are themselves only 128bits wide so I don't believe the emulators have any use for the 512bit (or even 256bit?) variants of the AVX512 instructions.

saagarjha|2 years ago

Game console emulators are of course specifically tuned for this.

Narishma|2 years ago

What other emulators beside rpcs3 use it?

Tuna-Fish|2 years ago

AVX-512 is specifically the first x86 vector extension for which compilers should eventually be able to emit reasonable code. Thanks to gather and masked execution, with AVX-512 vectorizing a simple loop doesn't always mean blowing up code size to 10x.

However, compilers have so far been slow to implement this, with the relevant patches only going into GCC right now.