top | item 16070963

(no title)

CmdDot | 8 years ago

Not really, once a program is compiled with -retpoline, new hardware won't bring back reliable branch prediction.

I'd hope maybe, just maybe, this would be enough to put a focus on compilers producing code that ends up using processor-optimized paths chosen at runtime, to avoid "overheads ranging from 10% to 50%".

Though, in this case, that would essentially mean making the entire executable region writable for some window of time, which is clearly too dangerous, so I guess the 0.1% speedups from compiling undefined behavior in new and interesting ways, will continue taking priority.

I mean, it's a compiler flag right, obviously whoever's going to run a program on an unaffected platform will take the effort to recompile everything with the flag removed.

Just the same way every serious application currently provides different executables for running on systems where SSE2, SSE4.1, or AVX2 is present.

discuss

maxerickson|8 years ago

Horizontal scaling though. If every individual processor is slower, more are needed.

mike_hearn|8 years ago

Not quite - lots of "serious" applications these days are written to target JIT compilers, which would be capable of switching retpoline on and off depending on need.

CmdDot|8 years ago

Funnily enough, I ended up not including a PS starting with "A sufficiently smart JIT, however..." ;)

gmueckl|8 years ago

I'd rather have linkers go down a similar road that the Linux kernel went on a over a decade ago: provide binary patches in a table (essentially alternative machine code) and have the linker patch the correct alternative depending on the CPU and it's bugs. The Linux kernel already contains an "alternatives" segment which is exactly this kind of list of patches. It would be trivial to add such a table to ELF and PE formats and have the runtime linker process that while it's plowing through the code anyway.

lower|8 years ago

Something like this exists with function multi-versioning: https://lwn.net/Articles/691932/

For example, glibc chooses optimised machine code for memcpy depending on the CPU it runs on.

ant6n|8 years ago

New CPUs could just convert the retpoline back to the original jump in microcode, and enable the now timing-attack safe branch predictor.

gmueckl|8 years ago

But even then a performance hit remains due to the increased code size of the instruction sequence.