top | item 6502729

(no title)

cdtwigg | 12 years ago

Auto-vectorization is impossible for a lot of real-world code because it requires changing how data is laid out in memory. Notice that the AVX version of the raytracer actually involves packing blocks of x components into a single 256-bit-wide variable. Realistically, a compiler is not going to be smart enough to figure that out.

discuss

corresation|12 years ago

Absolutely true, though of course you could do the same memory layout with the Go code. If we're talking about compiler comparisons, a vectorization-suitable tighter inner loop that operates on contiguous memory would be a good high performance comparison. The standard Go compiler would not vectorize it...yet...though honestly I don't know what the state of gccgo is or whether it yields an intermediary that brings the gcc vectorization into play.

And of course the reason you code for auto-vectorization is for ease of platform support. The linked AVX code will not run on the vast majority of virtual machines, or any CPU made prior to 2012. Nor will it take advantage of AVX2. I use the Intel compiler and either yield builds that I can target to specific processors or technology levels or I can add support for virtually all technologies, such that the same code will vectorize on AVX2, failing that AVX, failing that SSE3.2, failing that... you get the picture. With a suitable ARM compiler the same code would vectorize to NEON, etc.