top | item 31224787

(no title)

mdb31 | 3 years ago

Cool performance enhancement, with an accompanying implementation in a real-world library (https://github.com/lemire/despacer).

Still, what does it signal that vector extensions are required to get better string performance on x86? Wouldn't it be better if Intel invested their AVX transistor budget into simply making existing REPB prefixes a lot faster?

discuss

order

37ef_ced3|3 years ago

AVX-512 is an elegant, powerful, flexible set of masked vector instructions that is useful for many purposes. For example, low-cost neural net inference (https://NN-512.com). To suggest that Intel and AMD should instead make "existing REPB prefixes a lot faster" is missing the big picture. The masked compression instructions (one of which is used in Lemire's article) are endlessly useful, not just for stripping spaces out of a string!

mhh__|3 years ago

Many people seem to think AVX-512 is just wider AVX, which is a shame.

NN-512 is cool. I think the Go code is pretty ugly but I like the concept of the compiler a lot.

janwas|3 years ago

Why is a large speedup from vectors surprising? Considering that the energy required for scheduling/dispatching an instruction on OoO cores dwarfs that of the actual operation (add/mul etc), amortizing over multiple elements (=SIMD) is an obvious win.

mdb31|3 years ago

Where do I say that the speedup is surprising?

My question is whether Intel investing in AVX-512 is wise, given that: -Most existing code is not aware of AVX anyway; -Developers are especially wary of AVX-512, since they expect it to be discontinued soon.

Consequently, wouldn't Intel be better off by using the silicon dedicated to AVX-512 to speed up instruction patterns that are actually used?

ip26|3 years ago

Is it generally possible to convert rep str sequences to AVX? Could the hardware or compiler already be doing this?

AVX is just the SIMD unit. I would argue the transistors were spent on SIMD, and the hitch is simply the best way to send str commands to the SIMD hardware.

nwmcsween|3 years ago

Why? IIRC something like 99% of string operations are on 20 chars or less. If you're hitting bottlenecks then optimize.