(no title)
37ef_ced3 | 3 years ago
And I claim that that is the real problem with AVX-512 (and pretty much any vectorization). I personally cannot find a single benchmark that does anything I would ever do - not even remotely close. So if you aren't into some chess engine, if you aren't into parsing (but not using) JSON, if you aren't into software raytracing (as opposed to raytracing in games, which is clearly starting to take off thanks to GPU support), what else is there?
Answer? Neural net inference, e.g., https://NN-512.comIf you need a little bit of inference (say, 20 ReNet50s per second per CPU core) as part of a larger system, there's nothing cheaper. If you're doing a small amount of inference, perhaps limited by other parts of the system, you can't keep a GPU fed and the GPU is a huge waste of money.
AVX-512, with its masked operations and dual-input permutations, is an expressive and powerful SIMD instruction set. It's a pleasure to write code for, but we need good hardware support (which is literally years overdue).
dragontamer|3 years ago
There's.... a ton of applications of AVX512. I know that Linus loves his hot-takes, but he's pretty ignorant on this particular subject.
I'd say that most modern computers are probably reading from TLS1.2 (aka: AES decryption), processing some JSON, and then writing back out to TLS1.2 (aka: AES Encryption), with probably some CRC32 checks in between.
--------
Aside from that, CPU signal filtering (aka: GIMP image processing, Photoshop, JPEGs, encoding/decoding, audio / musical stuff). There's also raytracing with more than the 8GB to 16GB found in typical GPUs (IE: Modern CPUs support 128GB easily, and 2TB if you go server-class), and Moana back in 2016 was using up 100+ GB per scene. So even if GPUs are faster, they still can't hold modern movie raytraced scenes in memory, so you're kinda forced to use CPUs right now.
ilyt|3 years ago
that already have dedicated hardware on most of the x86 CPUs for good few years now. Fuck, I have some tiny ARM core with like 32kB of RAM somewhere that rocks AES acceleration...
> So even if GPUs are faster, they still can't hold modern movie raytraced scenes in memory, so you're kinda forced to use CPUs right now.
Can't GPUs just use system memory at performance penalty ?
323|3 years ago
quelsolaar|3 years ago
You could say that the intersecting area in the ven diagram of "Has to run on CPU" and "Can use vector instructions" is small.
fulafel|3 years ago
Not to mention a host of other smaller problems (eg no standard way to write tightly coupled CPU/GPU codes, spotty virtualization support in GPUs, lack integation in estabilished high level languages, etc chilling factors).
The ML niche that can require speficic kinds of NVidia GPUs seems to be an island of its own that works for some things, but it's not great.
dtx1|3 years ago
People are forgetting the "Could run on a GPU but I don't know how" factor. There's tons of Situations where GPU Offloading would be fast or more energy efficient but importing all the libraries, dealing with drivers etc. really is not worth the effort, whereas doing it on a CPU is really just a simple include away.
titzer|3 years ago
I dunno, JSON parsing is stupid hot these days because of web stacks. Given the neat parsing tricks by simdjson mentioned upthread, it seems like AVX512 could accelerate many applications that boil down to linear searches through memory, which includes lots of parsing and network problems.
mhh__|3 years ago
CPUs (let alone CPUs talking to a GPU) spend huge numbers of cycles shunting data around already.
dragontamer|3 years ago
Memcpy and memset are massively parallel operations used on a CPU all the time.
But lets ignore the _easy_ problems. AES-GCM mode is massively parallelized as well, each 128-bit block of AES-GCM can run in parallel, so AVX512-AES encryption can process 4 blocks in parallel per clock tick.
Linus is just somehow ignorant of this subject...
rjsw|3 years ago
bobowzki|3 years ago
paulmd|3 years ago
begging the fucking question to the max, post better linus
WithinReason|3 years ago
adgjlsfhk1|3 years ago