Was the connection with speculative execution already being discussed openly? I know about https://cyber.wtf/2017/07/28/negative-result-reading-kernel-..., but not about anything between that and 28 Dec suggesting someone made it work and that's the reason for KPTI.
If it wasn't in the open, seems...not ideal embargo-wise for AMD to leak it there. Though no one's in that thread complaining about the disclosure, so maybe they either think that part is already known to anyone looking closely, or just don't think it's a very big piece of the exploit puzzle (like, finding the way to get info out a side channel was the hard part).
It wasn't publicly acknowledged but people figured it out already. Take a look at https://news.ycombinator.com/item?id=16046636 (both the article and the comments) for example. This wasn't going to stay secret much longer.
https://twitter.com/dougallj has released source code (https://t.co/vaaMyajriH) which partially reproduces the problem. you need a little bit of tweaking to read kernel memory and to read the actual values. from his twitter and from i've observed sometimes the speculative code will see 0 and sometimes it will see the correct value. he speculates that it might work if the value is already in the cache.
This is going to have dramatic effect on the cloud computing market. It might make sense to make sure any VMs you run are on AMD processors or it can really hurt your performance and basically cost you more to do the same workload.
It also seems, from early benchmarks, this can slaughter performance with databases.
Don't worry. I don't think that there will be two separate kernels for Intel and AMD. I think performance drop will be on both CPUs no matter has it the bug or not.
This feels like a big FU to Intel. I've heard this patch can slow down programs like du by 50%. Does that mean AMD is going to find itself running twice as fast as competitors?
I think the du case was an outlier. Normal workloads shouldn't be so heavily affected. I am expecting a few percent loss on most programs though. It's basically a larger penalty for making a syscall, which was already a fairly slow operation so performance minded people avoid them in tight loops. It will be bad for people who need to do lots of fast I/O I suspect.
"The overhead was measured to be 0.28% according to KAISER's original authors,[2] but roughly 5% for most workloads by a Linux developer.[1]" [1] = https://lwn.net/Articles/738975/
Though the patches evolved since then. So I guess we'll see.
Yes. AMD didn't take shortcuts, and implemented the spec correctly. Intel took shortcuts, introduced bugs, and now to compensate for that the OS has to work around it in software, it's going to be slow. For years Intel has reaped the benefits of shortcuts for performance, while AMD has been implementing things correctly; now there is a correction.
The text as written only seeks to defend AMD's product. Whether the sub text goes further is open to non objective speculation. Having said that I'm sure AMD are feeling pretty happy with their statement. Schadenfreude may be too long a bow...
All Intel CPU's are affected, mitigation syscall overhead increased by 50%, and none of AMD CPU's affected? I would say this could be an indicator to short INTC and long AMD...
If the hit is as bad as they say (30% performance), cloud providers will be almost forced to upgrade when the new hardware comes out that fixes it. Are they really ready to adopt AMD? Go long on INTC?
Essentially looks like Intel compromised (whether intentional or not is a different point) the design to get the speed boost that gave them the lead over AMD for the past decade. Will be interesting to see how all this plays out.
Other than leaking timing information though, is there any reason why this kind of speculative execution can't be secure? Apparently we're going to find out more in the coming weeks, but it feels strongly like Intel has made a number of mistakes leading up to this.
At the meta level this is just a special case of "complexity is evil" in security. CPUs have been getting more and more complex, and the relationship between complexity and bugs (of all types) is exponential. Each new CPU feature exponentially increases the likelihood of errata.
A major underlying cause is that we're doing things in hardware that ought to be done in software. We really need to stop shipping software as native blobs and start shipping it as pseudocode, allowing the OS to manage native execution. This would allow the kernel and OS to do tons and tons of stuff the CPU currently does: process isolation, virtualization, much or perhaps even all address remapping, handling virtual memory, etc. CPUs could just present a flat 64-bit address space and run code in it.
These chips would be faster, simpler, cheaper, and more power efficient. It would also make CPU architectures easier to change. Going from x64 to ARM or RISC-V would be a matter of porting the kernel and core OS only.
Unfortunately nobody's ever really gone there. The major problem with Java and .NET is that they try to do way too much at once and solve too many problems in one layer. They're also too far abstracted from the hardware, imposing an "impedance mismatch" performance penalty. (Though this penalty is minimal for most apps.)
What we need is a binary format with a thin (not overly abstracted) pseudocode that closely models the processor. OSes could lazily compile these binaries and cache them, eliminating JIT program launch overhead except on first launch or code change. If the pseudocode contained rich vectorization instructions, etc., then there would not be much if any performance cost. In fact performance might be better since the lazy AOT compiler could apply CPU model specific optimizations and always use the latest CPU features for all programs.
Instead we've bloated the processor to keep supporting 1970s operating systems and program delivery paradigms.
It's such an obvious thing I'm really surprised nobody's done it. Maybe there's a perverse hardware platform lock-in incentive at work.
Can intel release a drop in CPU that will avoid or mitigate this issue?
The infrastructure investment in intel cores is huge, if a drop in replacement lets me minimize downtime, re-gain performance and is "cost effective" compared to a cost prohibitive replacement does this result in intel having a sales INCREASE where it replaces bad silicon?
I don't know enough about this issue to speak to the issue either way, but I would love to hear if this fix is possible/viable.
Until more information is available, who knows. It might be fixable in microcode, it might be fixable in a new processor stepping, it might require a deeper rework that wont come out until the next generation of processors (or even the generation after that).
Wouldn't this kind of issue validate the ideas of microkernel-based OSs, where kernel and user spaces are already completely separated?
BTW, removing the kernel from the non-privileged address space seems like such a great idea (which is not a new one at all) the whole thing should probably should have some hardware support to be made fast.
Given Intel's dominance of the server market does this mean that datacenter computational capacity will see an overnight ~5% drop?
Is there enough spare capacity to cope with this? Will spot-instance prices go up? Will I need more instances of a given type to run the same workload?
All that I've read about this so far seems to indicate that it's only a way to bypass KASLR... which is itself not really a problem, but there must be something more to it. Given that it doesn't affect AMD, perhaps it's related to Intel ME?
Data structures stored in kernel space, such as llds [1], will not incur the overhead of the TLB flush/load.
I suspect that storing data in the kernel space in order to avoid maintaining a large application PD will become the norm, whereas in the past it has been reserved for use cases like search engines with massive in-memory trees.
Would it be possible to slow down segfault notifications to mitigate the attack? For example, if the segfault was not on kernel space, halt the application for the time offset of a kernel read. In this way all segfaults would be reported at more or less the same time and the attack could be avoided.
Are there any sane apps that depends on timely segfault handling and thus might be affected by such a workaround?
It's not timing the segfault delivery itself, the idea is to time another read of your own address space after the fault to see if it's been prefetched or not.
Maybe you could CLFLUSH on segfault delivery though.
I sometimes wonder if verifying properties of the code we run wouldn't be smarter than relying on hardware isolation. Or at-least in addition to hardware isolation, so that there is two layers.
By verify I'm thinking NativeClient-like or JVM isolation.
Obviously, it would entail complete OS rewrite, or maybe partial...
Would it make sense to switch core at the same time the context is switched between user and kernel? The hit with cache is already there and, if one could go back and forth to already primed caches on different cores, at least some of the performance issues would be mitigated.
[+] [-] caio1982|8 years ago|reply
[+] [-] calt|8 years ago|reply
[+] [-] twotwotwo|8 years ago|reply
If it wasn't in the open, seems...not ideal embargo-wise for AMD to leak it there. Though no one's in that thread complaining about the disclosure, so maybe they either think that part is already known to anyone looking closely, or just don't think it's a very big piece of the exploit puzzle (like, finding the way to get info out a side channel was the hard part).
[+] [-] daenney|8 years ago|reply
[+] [-] my123|8 years ago|reply
[+] [-] AnssiH|8 years ago|reply
I imagine if someone had complaints they would make them in private so as to not make the situation even less ideal embargo-wise.
[+] [-] benmmurphy|8 years ago|reply
[+] [-] emusan|8 years ago|reply
[+] [-] tedunangst|8 years ago|reply
[+] [-] electic|8 years ago|reply
It also seems, from early benchmarks, this can slaughter performance with databases.
[+] [-] caf|8 years ago|reply
http://lkml.iu.edu/hypermail/linux/kernel/1801.0/01274.html
[+] [-] bhouston|8 years ago|reply
[+] [-] yeukhon|8 years ago|reply
[+] [-] vasili111|8 years ago|reply
[+] [-] anonacct37|8 years ago|reply
[+] [-] jandrese|8 years ago|reply
[+] [-] blattimwind|8 years ago|reply
Though the patches evolved since then. So I guess we'll see.
[+] [-] SolarNet|8 years ago|reply
That's how the market works.
[+] [-] kzrdude|8 years ago|reply
[+] [-] tankenmate|8 years ago|reply
[+] [-] thinkMOAR|8 years ago|reply
[+] [-] bitwind|8 years ago|reply
[+] [-] IgorPartola|8 years ago|reply
[+] [-] 0x00000000|8 years ago|reply
[+] [-] rdtsc|8 years ago|reply
I would say that too if I'd be waiting for everyone to sell so then I could buy INTC :-)
[+] [-] cjbprime|8 years ago|reply
[+] [-] artellectual|8 years ago|reply
[+] [-] jchw|8 years ago|reply
[+] [-] rootlocus|8 years ago|reply
If it wasn't intentional, then it wasn't a compromise. So it's not a different point.
[+] [-] bhouston|8 years ago|reply
Core 2 architecture? Nehalem?
[+] [-] mindcrash|8 years ago|reply
Having a hunch Threadripper will sell extremely well amongst PC enthousiasts this year...
[+] [-] api|8 years ago|reply
A major underlying cause is that we're doing things in hardware that ought to be done in software. We really need to stop shipping software as native blobs and start shipping it as pseudocode, allowing the OS to manage native execution. This would allow the kernel and OS to do tons and tons of stuff the CPU currently does: process isolation, virtualization, much or perhaps even all address remapping, handling virtual memory, etc. CPUs could just present a flat 64-bit address space and run code in it.
These chips would be faster, simpler, cheaper, and more power efficient. It would also make CPU architectures easier to change. Going from x64 to ARM or RISC-V would be a matter of porting the kernel and core OS only.
Unfortunately nobody's ever really gone there. The major problem with Java and .NET is that they try to do way too much at once and solve too many problems in one layer. They're also too far abstracted from the hardware, imposing an "impedance mismatch" performance penalty. (Though this penalty is minimal for most apps.)
What we need is a binary format with a thin (not overly abstracted) pseudocode that closely models the processor. OSes could lazily compile these binaries and cache them, eliminating JIT program launch overhead except on first launch or code change. If the pseudocode contained rich vectorization instructions, etc., then there would not be much if any performance cost. In fact performance might be better since the lazy AOT compiler could apply CPU model specific optimizations and always use the latest CPU features for all programs.
Instead we've bloated the processor to keep supporting 1970s operating systems and program delivery paradigms.
It's such an obvious thing I'm really surprised nobody's done it. Maybe there's a perverse hardware platform lock-in incentive at work.
[+] [-] zer00eyz|8 years ago|reply
Can intel release a drop in CPU that will avoid or mitigate this issue?
The infrastructure investment in intel cores is huge, if a drop in replacement lets me minimize downtime, re-gain performance and is "cost effective" compared to a cost prohibitive replacement does this result in intel having a sales INCREASE where it replaces bad silicon?
I don't know enough about this issue to speak to the issue either way, but I would love to hear if this fix is possible/viable.
[+] [-] nothrabannosir|8 years ago|reply
Don’t forget to correct for the subtle loss in credibility, and subsequent immeasurably subtle dip in sales, amortised over… forever.
[+] [-] nine_k|8 years ago|reply
[+] [-] ac29|8 years ago|reply
[+] [-] rbanffy|8 years ago|reply
BTW, removing the kernel from the non-privileged address space seems like such a great idea (which is not a new one at all) the whole thing should probably should have some hardware support to be made fast.
[+] [-] airesQ|8 years ago|reply
Is there enough spare capacity to cope with this? Will spot-instance prices go up? Will I need more instances of a given type to run the same workload?
[+] [-] userbinator|8 years ago|reply
[+] [-] pkaye|8 years ago|reply
[+] [-] rst|8 years ago|reply
and the original source for that report: https://twitter.com/aionescu/status/930412525111296000
OSX and BSD variants are an interesting question...
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] zippie|8 years ago|reply
I suspect that storing data in the kernel space in order to avoid maintaining a large application PD will become the norm, whereas in the past it has been reserved for use cases like search engines with massive in-memory trees.
[1] https://github.com/johnj/llds
[+] [-] czeidler|8 years ago|reply
Are there any sane apps that depends on timely segfault handling and thus might be affected by such a workaround?
[+] [-] caf|8 years ago|reply
Maybe you could CLFLUSH on segfault delivery though.
[+] [-] jopsen|8 years ago|reply
By verify I'm thinking NativeClient-like or JVM isolation.
Obviously, it would entail complete OS rewrite, or maybe partial...
[+] [-] sandworm101|8 years ago|reply
[+] [-] rdudek|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] rbanffy|8 years ago|reply