Time protection: the missing OS abstraction

[+] niftich|7 years ago|reply

This post lifts out the key parts of the paper [1] and is a good summary. I think the paper is an accessible read as well.

Not too much discussion two weeks ago when the paper was posted on HN [2], so I will raise a point I've made before [3][4][5] and is consistent with the recommendations of the paper (and another post in this thread [6]): this is an opportunity to improve the terminology, mental models, and formalisms of observable state, and its implications for information hiding, privilege separation, and computer system design.

This conversation needs not only to occur among (e.g.) computer chip designers and cryptography experts, but also among higher-stack users of that technology, so that the information leakage aspects and trade-offs can be analyzed together with other performance indicators of the system.

It seems as if the haphazard, ad hoc way that chipmakers and system architects dealt with this issue have contributed to an environment where Spectre could occur: and such timing attacks were never a secret, but resistance to them in various levels of mainstream computing appears to have been fitted in a patchwork of hasty fixes and well-meaning but informal caution. The conversation around this topic could use an upgrade, and the paper's authors agree.

[1] https://ts.data61.csiro.au/publications/csiroabstracts/Ge_YC... [2] https://news.ycombinator.com/item?id=19547293 [3] https://news.ycombinator.com/item?id=17308014 [4] https://news.ycombinator.com/item?id=16165942 [5] https://news.ycombinator.com/item?id=19644997 [6] https://news.ycombinator.com/item?id=19670296

[+] ccvannorman|7 years ago|reply

You might even say there's room for a startup to reinvent computers (and operating systems) from the ground up.

Would you?

[+] akersten|7 years ago|reply

It saddens me that we're collectively going to spend a lot of effort trying to patch out a problem that we've imposed upon ourselves. We were making such great progress in terms of processing speed until someone came along and decided that we need to have multiple tenants share the same hardware, and they should have no way of knowing anything about each other. The vast majority of consumer hardware will _never_[0] be exposed to this category of attack, but will pay the performance penalty regardless.

Fundamentally, the need is for a completely different model of computation to abstract away time-channel leaks. This cannot be fixed by patching existing software and hardware, and we're going to go through a lot of pain and anguish trying. As another comment points out, the well of possible timing attacks is infinitely deep (attached hardware, network performance measurements, etc.).

The two options are performance or security, pick one. It seems the industry is trying to pick both, and it's going to take us a long time to realize that we're going to get neither.

For clarity - my proposal is segmenting hardware and software products between the two categories of "general purpose, trusted computing" and "safe for shared hosting." The 2nd category is so small compared to the first, it seems unfair that its domain-specific problems should hamper the rest of us.

[0] Thanks to a combination of reasonable software mitigations (unprivileged lower-resolution timers) and that most of these attacks require arbitrary code execution in the first place

[+] speedplane|7 years ago|reply

> my proposal is segmenting hardware and software products between the two categories of "general purpose, trusted computing" and "safe for shared hosting."

If a normal user is using a web browser and one tab has their bank information, and the other has a suspect website, then you have to be concerned about sharing resources and security.

[+] gnode|7 years ago|reply

> The vast majority of consumer hardware will _never_[0] be exposed to this category of attack

Arbitrary code execution is common in the browser (Javascript and Webassembly). Really, this is any case where you don't entirely trust every program running on the device (e.g. smartphones).

> Thanks to a combination of reasonable software mitigations

That is time protection. Restricting access to system timers wasn't enough here; mitigations also need to prevent user-created high resolution timers, so useful features like SharedArrayBuffer had to be disabled, to prevent the creation of synthetic timers.

> The two options are performance or security, pick one.

That is not the findings of the paper: "Across a set of IPC microbenchmarks, the overhead of time protection is remarkably small on x86 [1%], and within 15% on Arm."

I think if there's a segmentation to be made, it's "general purpose, untrusted computing" and "trusted high performance computing". The second category would be the reserve of such projects as physics simulations and render farms.

[+] jankotek|7 years ago|reply

every phone runs javascript in browser

[+] peterkelly|7 years ago|reply

> someone came along and decided that we need to have multiple tenants share the same hardware, and they should have no way of knowing anything about each other

Isolation between apps on mobile phones is really important. That's a huge part of the computing landscape in terms of number of devices deployed, and falls within the 2nd category. I don't think it's realistic to dismiss so easily.

[+] naasking|7 years ago|reply

Very cool ideas. Time protection is indeed a must going forward, particularly for cloud hosting. Not surprised that seL4 got there first either.

Getting this to work in raw Linux may be hopeless given the breadth of the kernel data, but they have smart devs, so maybe they'll figure something out.

And Rob Pike said systems research is dead. Ha!

[+] dooglius|7 years ago|reply

One issue I see here is that time protection would need to extend to anything shared, not just CPU micro-architecture. For instance, if a hard drive has a DRAM-based cache, that could be used as a timing channel, and the complexity of flash file systems opens up all kinds of potential leaks. In the case of two processes sharing network access, one process could conceivably estimate another's network access patterns implicitly by measuring latency through shared switches or drops due to buffers being filled. Mitigating this would require some kind of coloring support that goes as far as your ISP's switches, which seems impractical.

[+] adrianratnapala|7 years ago|reply

> One issue I see here is that time protection would need to extend to anything shared,

We might see this issue as an opportunity. That is, by thinking about a concept called "time protection" we expose all these things subsystems are doing and make them easy to argue about. We can now say "Oh good, XYZ improves best-case speed, but sadly it also compromises time protection".

Having such a language means the industry can slowly start improving these things rather than sweeping them under the rug. It will not stop the improvement from being slow and difficult.

[+] im3w1l|7 years ago|reply

I think there needs to be a little bit of contribution by both hardware and software sides. A "sufficiently stupid" application is not possible to protect. We need a set of best practices and guarantees that you wont leak if you follow them.

And there may even need to be a third part. An understanding that nothing can be fully protected.

[+] _cs2017_|7 years ago|reply

As an alternative to this approach, I wonder if it's possible to push all sensitive computations into a few small components, and rewrite those components carefully to obscure any information that could be obtained from timing?

[+] FrozenVoid|7 years ago|reply

Of course. Branchless code equivalents can be written for practically anything, and you can force-prefetch memory regardless of branch(its an intrinsic in most C compilers), though this loses performance and branch prediction benefits.

[+] myWindoonn|7 years ago|reply

Very dramatic graphs.

I wonder when programming languages will start factoring out the ability to check system timers. Some capability-aware languages have already done so.

[+] 6keZbCECT2uB|7 years ago|reply

I wonder if high resolution timers were privileged, we could get by with lower resolution timers. I'm not sure any timing attacks would work with second or even millisecond resolution timers.

I don't see handling this at the programming language could help, and I think that whether timing is privileged or not is built into CPUs so there's nothing we can do about it, but this seems plausibly acceptable to deal with speculation. Permit speculation, but make it privileged to detect if speculation occurred.

[+] kccqzy|7 years ago|reply

I don't think so. I think the whole point is that the OS provides time protection in the same manner it provides memory protection: completely transparent to the application. Just like your typical user-mode application does not need to worry about virtual vs physical addresses, I'd say the typical user-mode application would not need to worry about the effect of time as well.

[+] irq-1|7 years ago|reply

Just a thought, but can't the OS prevent applications from knowing anything about other applications? Rather than isolating apps by flushing/coloring everything, couldn't the apps not know what else is running? Two apps can't communicate, or one app spy on another, if an app doesn't know what else was/is running.

(Not sure why this is wrong, but confident it must be.)

[+] psds2|7 years ago|reply

These exploits work on a hardware level, and do not require the malicious app to know what else is running. For example, a VM does not know what other VMs are running on the same host in AWS, but Spectre/Meltdown still affected AWS hosts. They are reading data other apps have written to memory.

[+] unknown|7 years ago|reply

[deleted]

[+] JoeAltmaier|7 years ago|reply

Or, don't let timing reveal privileged information? This seems like a sledgehammer for a fly problem.

[+] dooglius|7 years ago|reply

How is what you're talking about different from what the article describes?

[+] simion314|7 years ago|reply

I would like to see how Boeing and FAA decided to pretend that all is fine after the first crash, there were enough clues that MCAS has issues and I would like to see how was decided that is safe to fly while the software fix was not deployed.

51 comments