LLVM patch to fix half of Spectre attack

tptacek|8 years ago

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

Its hard to get your head around how big a deal this is. This vulnerability is so bad they killed x86 indirect jump instructions. It's so bad compilers --- all of them --- have to know about this bug, and use an incantation that hacks ret like an exploit developer would. It's so bad that to restore the original performance of a predictable indirect jump you might have to change the way you write high-level language code.

It's glorious.

jasode|8 years ago

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

[EDITED following text to replace "Intel bug" with "Spectre bug" based on ars and jcranmer clarification. The Intel Meltdown can be fixed with operating system update patches for kpti instead of a complete recompile.]

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

Imagine the economics of all these mitigations. Also imagine that each of the cloud vendors AWS/Google/Azure/Rackspace had very detailed Excel spreadsheets extrapolating cpu usage for the next few years to plan for millions of $$$ of capital expenditures. Because of the severe performance implications of the bugfix (5% to 50% slowdown?), the cpu utilization assumptions in those spreadsheets are now wrong. They will have to spend more than they thought they did to meet goals of workload throughput.

There are dozens of other scenarios that we can't immediately think of.

voidmain|8 years ago

And I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem. They were just the variants that the few people in on this found time to develop exploits for. There can now be security bugs in things your program doesn't do; it seems like there is room for nearly unlimited creativity in finding them.

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

rayiner|8 years ago

Is glorious the right word for it? We’re going back to the stone ages where processors couldn’t predict the targets of indirect jumps. More generally, this seems to me like an attempt to patch out of what is really a class of attacks leveraging fundamental assumptions about high-performance CPU design. Before, OOO just had to preserve correctness and (some of) the order of exceptions and memory operations. Now, it has to preserve (some of) the timing of in-order execution too? Where does this path end?

js2|8 years ago

CPUs have been vulnerable to this attack since 1995. How did it collectively take us 22 years to figure this out? I know it's a highly esoteric complex attack, but there's no shortage of clever hackers in the world.

jmull|8 years ago

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

Here that perhaps means associating cache lines with a speculative branch somehow so that they aren't accessible until/unless the speculative branch becomes the real branch. (I have no idea exactly how that would be done or what the performance cost might be... I'd really need to know the details of how speculative execution is implemented in a particular CPU to even be able to guess.)

jncraton|8 years ago

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

eric_b|8 years ago

Prediction: This will be just like any vulnerability disclosure. The infosec people and media will scream hysterically about how game changingly bad it is. The OS vendors will patch, and business will go on as usual.

leeoniya|8 years ago

i know this came out as a leak, but makes one wonder how "responsible" even a Jan 9 official announcement would have been. the scope is absolutely terrifying. this bug will be exploitable for a very long time.

dzdt|8 years ago

When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%.

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

jerf|8 years ago

That's bad. A single 5% hit might not be the end of the world, but 5% here and 10% there and another 5% over there in the common case adds up badly enough. Doubly-pathological cases (indirect calling-heavy code calling lots of syscalls)... a 50% slowdown and a 30% slowdown combines to a 60% total slowdown. Yeowch.

Will be intrigued to see how processor manufacturers respond to this. If they were even slightly relaxed about it prior to disclosure I expect there's going to be some very hurried attempts to engineer some solutions pronto. This is the sort of thing where it might even be worth throwing away all of your future roadmap plans and just getting a revision of the current chips out there ASAP, whatever that may do to the rest of your roadmap.

Paul-ish|8 years ago

Will linux distributions automatically use this compilation option (or its analog in GCC) for packages from now until forever, even if a faster mitigation is added to CPUs?

vfaronov|8 years ago

Not to worry, it’s “just” 5–10% for “well tuned servers using all of [performance-saving] techniques”.

nindalf|8 years ago

The sentence that follows the line you quoted is

> However, real-world workloads exhibit substantially lower performance impact.

I feel like you could have mentioned this.

tzahola|8 years ago

I thought it would be Moore's law that forces people to care about their codes' performance. I was wrong, but am nevertheless happy about the recent developments. Programming will become an art once again :)

crb002|8 years ago

Agreed. dlopen() should wipe branch prediction caches by default, we need to add additional flags that turn this off.

chrisper|8 years ago

It's ok. The 9th generation of Intel will be 50% faster and the most secure CPU ever made! /s

AaronFriel|8 years ago

This is brutal for all interpreted/JITed languages and all statically compiled languages with dynamic dispatch. I can hardly imagine worse news for performance oriented engineers. And what's worse is that dynamic libraries will probably need to be rebuilt with these mitigations in mind, so nearly everyone will pay the cost even if they don't need it.

I feel bad for all of the engineers currently working on performance sensitive applications in these languages. There's a whole lot of Java, .NET, and JavaScript that's about to get slower[1]. Enterprise-y, abstract class heavy (i.e.: vtable using) C++ will get slower. Rust trait objects get slower. Haskell type classes that don't optimize out get slower.

What a mess.

[1] These mitigations will need to be implemented for interpreters, and JITs will want to switch to emitting "retpoline" code for dynamic dispatch. There's no world in which I don't expect the JVM, V8, and others to switch to these by default soon.

rntz|8 years ago

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
     ...
    }

If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];

Would this produce a data dependency that prevents speculative execution from reading an out-of-bounds memory address? (Ignore for the moment that a sufficiently smart compiler might "optimize" out the modulo here.)

jzl|8 years ago

A new thing that's going to become a standard part of systems engineering: deciding whether any given system needs to run with or without these kinds of protections. Do you want the speed of speculative execution or do you want Meltdown/Spectre protection? In some cases lack of protection is fine. But figuring out the answer for any given system is often going to take expert-level security knowledge. Security is all about multiple layers of protection, and even a non-public facing machine might benefit from these layers depending on the context.

s4vi0r|8 years ago

Spectre relies on tricking the CPU into branch predicting its way into accessing protected memory, no? Is it not possible that we can keep most of the performance benefits of speculative execution by somehow having a built in "Hey, never ever speculate that I'll want to access this region of memory" sort of thing?

crb002|8 years ago

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

ece|8 years ago

More likely, this is a shift back to in-order processors, if the solutions aren't workable. If you're in an embedded scenario, sure you can make more trade-offs and have more control, but it's not going to look great when it happens to get hacked.

leni536|8 years ago

It has an interesting performance impact on calls to dynamic libraries. One alternative approach would be to avoid the indirect calls through not using '-fPIC --shared' when building shared libraries but '-mcmodel=large --shared'. This causes the relocations to happen at the direct calls and not through a GOT.

The obvious drawback that it effectively disables sharing code in memory, it would still allow sharing code on disk though. So it would be a middle ground between the current state in dynamic and static linking.

https://www.technovelty.org/c/position-independent-code-and-...

ealexhudson|8 years ago

This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

JdeBP|8 years ago

And once one knows the technical background, one is better positioned to consider the response of Linus Torvalds to the idea that the entire Linux kernel be recompiled for all x86 CPUs with a compiler that implements this.

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

kough|8 years ago

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

badrequest|8 years ago

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

ben_jones|8 years ago

And the people who invented computers, programming languages, the internet, and all the learning resources, that allow me to get a paycheck writing extremely high level application code that feels like a coloring book in comparison. Truly the shoulders of giants.

jacksmith21006|8 years ago

Also to Google for finding and documenting it so well. Google security team really should be given an award.

vfaronov|8 years ago

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

xigency|8 years ago

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

Klathmon|8 years ago

I'm not more knowledgeable than you, but I think I agree.

side channels have always been some of the most insidious exploits. Many are basically un-solvable (timing attacks are always going to leak some information, and compression is basically completely at odds with secure information storage), many more are easily enough overlooked that it would be easy to maliciously include them without raising any eyebrows, and the "fixes" for them almost always murder performance.

I think the only real fully-encompassing solution to this is a redesign in how we use computers. Either a massive step backwards on performance and turning off most automatic "optimizations" until they can be proven through a much more rigorous process (both in compilers, and in hardware), or a significant change in how computers are architected adding more hardware level isolation for processes and systems running on the machine (just daydreaming now, but something like a cluster of isolated micro-CPUs that run one application only).

rsync|8 years ago

"What do people more knowledgeable in the field think about this?"

https://marc.info/?l=openbsd-misc&m=118296441702631&w=2

(from 2007)

phkahler|8 years ago

RISC-V impact? With all the reports of these attacks, I have not seen mention of risc-v. Since they are in the process of finalizing a lot of specs including memory model and privileged instructions, I wonder if there will be last minute changes to mitigate these vulnerabilities.

bem94|8 years ago

The problem (in my understanding) is not with the specification of the x86 ISA, but with the implementation of the speculative execution micro-architecture and probably the memory sub-system as well. That is why Intel is so badly affected by the problem, but not AMD, despite them both implementing the same instruction set.

RISCV has already had to fix its memory consistency model, so it is not without problems. But it that is a spec bug, not an implementation bug. Whether there is an out of order, speculative execution RISCV core in the wild which suffers from this is as far as I know very unlikely. If there is, no doubt it's designers have had a busy time lately.

leoc|8 years ago

At the risk of being a HN self-parody, I’ve also been wondering what this means for the Mill...

https://millcomputing.com/docs/prediction/

Tuna-Fish|8 years ago

The details that this attack depends on are outside the architecture of the system, in the microarchitecture. A cpu of almost any architecture can be vulnerable or not depending on how it was implemented, thus Ryzen is immune to the worst variant while both Intel and the fastest Arm cpus are vulnerable.

I'd presume that the slowest RISC-V designs are immune due to not speculating enough, while any high-performance implementation is vulnerable.

ars|8 years ago

As of right now every single CPU that does speculative execution. (I.e. runs both sides of a branch then throws away the one that didn't end up being valid.)

Keyframe|8 years ago

RISC-V is an ISA, so it depends on the implementation.

bpye|8 years ago

I imagine BOOM and BOOM v2 may be vulnerable as they support OoO execution.

coldcode|8 years ago

I remember doing tricks like this in 6502 assembly and in other early processors. Amazing that to stop these attacks you have to come up with clever tricks again. Back in the 80's I would have never imagined this type of attack being something to worry about.

FLUX-YOU|8 years ago

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

peapicker|8 years ago

This brings to mind Ken Thompson's "Reflections on Trusting Trust"[1] -- after all, all I have to do to write code with the exploit is be able to remove the patch and rebuild the compiler and build some executables.

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

pwg|8 years ago

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

Unless the compiler is also patched to either disallow inserted assembly, or to modify the inserted assembly (this being both hard and dangerous), someone who wants to exploit the bug will just add their own inserted assembly code that exploits the bug, and a patched compiler won't help one bit in that case.

cws125|8 years ago

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

It appears that Skylake and later can actually predict retpolines? Some hardware features called IBRS, IBPB, STIBP (not a lot of details on this are out there) are supposedly coming in a microcode update.

jgowdy|8 years ago

The problem I see with this concept is ROP mitigations like Intel’s control flow enforcement don’t seem compatible with intentionally using tweaked addresses with ret. The address they inject won’t match the shadow stack and the program will be terminated.

DannyBee|8 years ago

This is true, and so far, nobody has a better idea. (IE i would expect that unless someone comes up with one, that hardware CFE in its current form dies and won't happen for Intel until the processors are changed in a way that mitigation is not needed)

teilo|8 years ago

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

als0|8 years ago

Assuming the compiler knows what it's doing :)

nathell|8 years ago

I can't help thinking of how the early-ITS approach to security (not only was there none, but looking at other users' work was a deliberate feature) was embraced by its users. I'm way too young to remember, but it rings a bell somewhere down my heart.

There's a lot of prominence being given to all kinds of damage malicious users might inflict, and ways to prevent or mitigate, but little to the malice itself. Whence does it arise? What emotions drive those users? What unmet needs?

Meanwhile, when these slowing-down patches for Sceptre and Meltdown arrive, I intend to not run them, to the possible extent. I intend to keep aside a VM with patches for critical stuff, like banking or others' data entrusted to me. But I don't want my machine to be slowed down just because someone, sometime, might invest effort in targeting these attacks at it. Given how transparent I want to be with my life, that's a risk I'm willing to take.

fwip|8 years ago

Most attacks aren't targeted at specific people. Hackers don't want to read your emails, they want your credit-card information, digital account passwords, or to compromise your computer to use in their botnet.

Sure, you might not have anything you want to hide in your life, but the drive-by javascript doesn't care about your secrets - it'll hack you anyway. Best-case scenario, you lose access to a bunch of accounts you used to use and need to create new identities from scratch. Worst-case, they clean you out financially, steal your identity, etc.

fooker|8 years ago

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

tptacek|8 years ago

An indirect jump is when your program asks the CPU to transfer control to a location that your code itself computes: "jmp %register". Compare to a direct jump, where the destination of the jump is hardcoded into the jump instruction itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with virtual function calls have lots of indirect jumps, because they parameterize functions: to get the "length" of the variable "foo", the function "bar" has to call one of 30 different functions, depending on the type of "foo"; the function to call is read out of a table at some offset from the base address of "foo". Or, another example is switch statements, which can compile down to jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a function, the CPU pushes the return address --- the next instruction address after the "call" --- to the stack. When you return from a function, you pop the return address and jump to it. There's a sort of "jmp %register" hidden in "ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and speculatively execute rets?" And, they do. So the retpoline mitigates this: instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where the middle sequence of instructions set off in "..." is jumped over and not executed, but captures the speculative execution that the CPU does --- the CPU expects the "ret" to return to the original "call", and does not know how to predict around the fact that we did the switcheroo on the return address.

How'd I do?

revelation|8 years ago

retpoline is just a convoluted way of doing an indirect jump/call designed to make branch prediction entirely useless. It's a novel concept because doing this is completely opposite to making a program run faster.

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

sanxiyn|8 years ago

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

contrarian_|8 years ago

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Pelam|8 years ago

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

pwg|8 years ago

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

The bug here is that the CPU is not aborting the speculation when fetches occur to addresses marked as "access denied". Instead the fetch happens and a line of normally inaccessible memory is put into cache by code that should not be able to get it read into the cache normally.

One hardware fix would be to plug that hole. Speculative reads get blocked when they encounter permission denied errors from the paging system and do not change the cache state. That blocks the Meltdown attack, but not the Spectre attack.

jacobolus|8 years ago

https://millcomputing.com/docs/ e.g. the most recent talk https://millcomputing.com/docs/threading/

userbinator|8 years ago

This is horrible, really really horrible. And I'm not talking about the bug itself, but the mitigation --- which is basically "stop using indirect jump and call instructions and recompile all your software". The latter is beyond unrealistic.

It also sets a very bad precedent: I understand people want to mitigate/fix as much as possible, but this is basically giving an implicit message to the hardware designers: "it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software."

hn_throwaway_99|8 years ago

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

What are any other options? It's hardware, that cannot be patched. Of course they will change chip designs going forward, but what else do you suggest folks do with the billions of chips that exhibit this problem?

ychen306|8 years ago

Go ahead, smash your computer, wait a few months, and buy a new one.

sempron64|8 years ago

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

imtringued|8 years ago

And since this patch is opt in it isn't enough to secure cloud providers.

jacquesm|8 years ago

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

unknown|8 years ago

[deleted]

strongholdmedia|8 years ago

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

crb002|8 years ago

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

okneil|8 years ago

The site is down for me. HN hug of death?

arboroia|8 years ago

Google text cache: https://reviews.llvm.org/D41723

Wayback Machine: https://web.archive.org/web/20180104131631/https://reviews.l...

:)

XnoiVeX|8 years ago

Yes. Give it about 5 minutes. It will load without images.

hultner|8 years ago

It was a bit slow but eventually loaded for me.

unknown|8 years ago

[deleted]

lousken|8 years ago

what about performance impact after new CPU architecture arrives? how is that going to work?

eptcyka|8 years ago

Mill can't come soon enough.

mike_hearn|8 years ago

What makes you think the Mill would be immune to these issues?

silimike|8 years ago

If this were 15 years ago, I'd say the site was SlashDotted.

andrewmcwatters|8 years ago

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.

242 comments