top | item 5598201

(no title)

DarkShikari | 13 years ago

I only pushed the code a few minutes ago, but binaries should probably be up at http://x264.nl/ relatively soonish (it's not my site though, so I wouldn't know exactly).

If you want to test without a physical Haswell, the Intel Software Development Emulator should work okay, albeit somewhat slowly. I'd post overall numbers for real Haswells, but Intel has apparently said we can't do that yet.

Regarding FMA, FMA3/4 are floating point only. Since x264 has just one floating point assembly function, only two FMA3/FMA4 instructions get used in all of x264 (not counting duplicates from different-architecture versions of the function). An FMA4 version has been included for a while; the new AVX2 version does include FMA3, but of course that won't run on AMD CPUs (yet).

XOP had some integer FMA instructions, but I generally didn't find them that useful (there's a few places I found they could be slotted in, though).

discuss

order

jamesaguilar|13 years ago

I've heard that there are c libraries for things like SSE2. I assume the same is true of AVX2. If this is so, why do you write so much of x264 in assembly? Do you find that there are significant gains versus c-code that uses SIMD libraries? Have I been misled that C is nearly as fast as assembly 99% of the time?

Note: I'm not trying to question your engineering chops, just trying to correct my own misconceptions.

DarkShikari|13 years ago

"C libraries for things like SSE2"? Do you mean math libraries that have SIMD implementations of various functions that are callable from C? This here is effectively writing those libraries; they don't exist until we write the code.

pjmlp|13 years ago

> I've heard that there are C libraries for things like SSE2...

Those are not C code, rather inline assembly or compiler intrisics, nothing of which has anything to do with C.