What goes on at the chip level is terrifying. "You have to understand," a hardware engineer once said to me, when we shipped a consumer computer that was clocking its memory system 15% faster than the chips were supposed to go, "that DRAMs are essentially analog devices." He was pushing them to the limit, but he knew what it was, and we never had a problem with the memory system.
There was a great TR from IBM describing memory system design for one of their PowerPC chips. Summary: Do your board layout, follow all these rules [a big list], then plan to spend six months in a lab twiddling transmission line parameters and re-doing layout until you're sure it works . . .
True horror story: The first board I was involved with when working at Lucent in 2001 was a monster modem card, provided something like 300 modems plus or minus. For a long time in the development process we had a weird reliability problem which just could not be tracked down. Until in desperation a hardware engineer started putting his scope probe everywhere ... and found a line (or set of them) where the signal was messed up in the middle but not at the end!!!
Most circuits works probabilistically and come up randomly because of timing and thermal noise, so it's basically impossible to get the same exact running state twice... It's just impossible.
For a small example, Look at the simplest SR latch circuit... It's metastable because it feedsback into itself.
The success of this approach to corrupting memory depends upon knowing the geometry of the memory chip. Naive calculations of which addresses correspond to an adjacent row may be incorrect.
It's interesting to see this issue addressed in 2015. In 1980 I worked at Alpha Microsystems and designed a memory chip test program which used translation tables based upon information we required chip manufacturers to give us in order for their chips to be used in the systems we sold.
That approach required us to only put one type of memory chip on a memory board. But back in the day microsytems were expensive and customers expected them to be well-tested.
Back in mid-1979, just before I went to college, I looked at the current alternatives and picked Alpha Micro as the company to go with for a system to computerize a bunch of doctor's offices. It worked very well, and then another one a few years later when they used one to help systematize a company providing satellite TV gear.
Actually, triggering rowhammer-induced bit flips does not require knowing the memory geometry. It is possible to trigger bit flips just by picking random pairs of addresses to hammer. See the README file at https://github.com/mseaborn/rowhammer-test.
I work in the chip industry. This was a good paper.
1. Note that chip-kill/Extended ECC/Advanced ECC/Chipspare which are all similar server vendor methods for 4-bit correction will prevent this problem. These methods are enabled on the better reliability server systems.
2. This failure mode has been known by the DRAM industry for a couple years now and the newest DRAM parts being produced have this problem solved. The exact solution varies by DRAM vendor. I wish I could go into specifics but I am unaware of any vendor that has stated publicly their fix.
Excellent article! The fact that they can reliably produce errors in most ram chips is worrying. They also provide a solution (probabilistic refresh of neighboring lines).
The scary thing is that such a "solution" could be silently disabled, with effectively no signs of any problem (and in fact it would probably increase performance a little!) - I mentioned this in the previous discussion here: https://news.ycombinator.com/item?id=8716977
I know there are other parameters of the memory controller that could be changed to cause corruption, e.g. reducing the refresh rate or tweaking the timings, but that is likely to yield random corruptions in normal use instead of this precise one.
To me, the real solution seems to be stop making DRAM with such high density processes until some design changes can be done that make them as reliable as before, because at some point it just stops behaving like real memory anymore and turns into a crude approximation of it; memory should be reliable and store the data it holds, without any corruption regardless of access pattern.
How long until someone uses this as the basis of an exploit? Maybe not root access, but if you can figure out an OS call that replicates the access pattern, you can corrupt machines just by interacting with them.
I'd start looking at this for ideas. It's a paper on using memory erros to break out of a java virtual machine. They didn't have a good way for generating them, so resorted to waiting.
There's no need for an OS call. Just userspace access to the same mapped memory is going to stay in the same physical page for far, far, longer than a few hundred thousand DRAM cycles. Obviously the hard part to an exploit would be locating those corrupt bits elsewhere in the system. That's going to depend entirely on the hardware layout of the DRAM chip.
One wonders if this has already been used in a exploit.
A good first check for security companies - examine all known attacks for fence instructions, which are rare. (Without a fence instruction, hammering on the same addresses will just cycle the caches, and not go out to DRAM.) Look at the code near them for a hammering loop.
This is a promising attack, because it might be able to break through a virtual machine boundary.
A test for this should be shipped with major Linux distros, and run during install. When someone like Amazon, Rackspace, or Google sends back a few thousand machines as rejects, this will get fixed.
Note that for the test to do row hammering effectively, it must pick two addresses that are in different rows but in the same bank. A good way of doing that is just to pick random pairs of addresses. If your machine has 16 banks of DRAM, for example (as various machines I've tested do), there should be a 1/16 chance that the two addresses are in the same bank. This is what the test above does. (Actually, it picks >2 addresses to hammer per iteration.)
Be careful about running the test, because on machines that are susceptible to rowhammer, it could cause bit flips that crash the machine (or worse, bit flips in data that gets written back to disc).
Hopefully this doesn't affect ECC DRAM? Also does the problem get worse with increased density - i.e. 16GB modules are more vulnerable than say the 8GB ones?
It helps, but they point out you can get two-bit errors ECC can merely detect (or, much more rarely, 3+-bit errors ECC isn't guaranteed to even detect). Mighty tricky to work out that exploit that flips just the right two security-relevant bits, though.
Problem: They propose a solution and calculate the reliability of the solution. Why not test it with their FPGA based memory controller and demonstrate an improvement?
Second: While the problem looks real enough, the tests to demonstrate it are not realistic. Hammering the same rows with consecutive reads does not happen in the real world due to caches which the get around via flushes. I'd like to see more data on how bad the abuse needs to be to cause the problem. Will 2 reads in a row cause errors? 5? 10? 100? They never address how likely this is to be a real-world problem. I don't doubt that it is, but how often?
They address all of these in the paper. It takes ~180,000 "hammered" reads to trigger. The problem is even the least-privileged code can do it because it's just reading without using the cache - a perfectly valid thing that must be allowed for multi-threaded code to even work correctly.
Secondly, the DRAM makers don't currently provide enough information to reliably know what neighbors to refresh. I suppose they could have used their guesses to test on the FPGA rig but given the rest of the paper I'm reasonably satisfied that they have correctly identified the problem and that their solution would work.
Reading the same address in an infinite loop is quite common in the multi-die/core real time low latency systems. In fact this is exactly what you are doing - when reading the FIFO queue pointers, etc. And rather than relying on QPI/cache coherency, you may even want to forcefully flush the cache every time you read, to reduce the latency.....
This kind of thing may not happen often but when it does it can cause errors which are very intermittent and impossible to diagnose. I have seen memory chips that will fail consistently, but only when running a specific program out of dozens of others tried. So it kind of looked like the program had a problem, except that it worked on other hardware.
It's quite easy to bypass cache without explicit flush. Most caches are 4-way or 8-way: just hit 4 or 8 addresses sharing the same cache line, and you got your original data evicted from cache.
I've completed many multi-gigahertz product designs during my career. If you take the time to study and understand the physics involved and bother to do a bit of math none of it is particularly difficult. I reject the characterization of this as some kind of a black art. It's not magic. Yes, of course, experience helps, but it isn't magic. One problem is that some in the industry are still using people who do layout based on how things look rather than through a scientific process. Yes, it's analog electronics. When was it anything else?
Want to wrap yourself around another challenging aspect of high-speed design? Power distribution system design (PDS). You can design perfect boards based on solid transmission line and RF theory and have them fail to work due to issues such as frequency-dependent impedances and resonance in the PDS.
So basically you just make a couple memory reads a few hundred thousands times and this will alter some near cell? why manufacturers didn't test this? it looks like a pretty obvious thing to test while working at these scales.
b/c the applications that care the most about these types of errors (big iron networking, and mil/aero) implement software error correction in the processor. It's not worth spending the extra money in final test when commodity DRAM is, well, a commodity.
I think row hammer is basically a DRAM design defect and wish it was fixed in the DRAM instead of on the controller side. At the very least the DRAM vendors should document this access pattern limitation in their datasheets.
Am I wrong (if you happen to work on processor microcode) or could microcode patches per processor insert a minimum delay where needed based on RAM parameters and organization to prevent this?
[+] [-] kabdib|11 years ago|reply
There was a great TR from IBM describing memory system design for one of their PowerPC chips. Summary: Do your board layout, follow all these rules [a big list], then plan to spend six months in a lab twiddling transmission line parameters and re-doing layout until you're sure it works . . .
[+] [-] hga|11 years ago|reply
Analog is a black art.
[+] [-] userbinator|11 years ago|reply
[+] [-] rab_oof|11 years ago|reply
For a small example, Look at the simplest SR latch circuit... It's metastable because it feedsback into itself.
[+] [-] hammer_test|11 years ago|reply
https://github.com/CMU-SAFARI/rowhammer
in Ubuntu 14.04, run this to bring all the dependencies for building: sudo apt-get build-dep memtest86+
Update: just finished running the test on my cheap Lenovo laptop. Not affected. phew! :)
[+] [-] Tobu|11 years ago|reply
Announcement http://www.passmark.com/forum/showthread.php?4836-MemTest86-...
Previous discussion https://news.ycombinator.com/item?id=8713411
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] zanethomas|11 years ago|reply
It's interesting to see this issue addressed in 2015. In 1980 I worked at Alpha Microsystems and designed a memory chip test program which used translation tables based upon information we required chip manufacturers to give us in order for their chips to be used in the systems we sold.
That approach required us to only put one type of memory chip on a memory board. But back in the day microsytems were expensive and customers expected them to be well-tested.
[+] [-] hga|11 years ago|reply
Back in mid-1979, just before I went to college, I looked at the current alternatives and picked Alpha Micro as the company to go with for a system to computerize a bunch of doctor's offices. It worked very well, and then another one a few years later when they used one to help systematize a company providing satellite TV gear.
[+] [-] mseaborn|11 years ago|reply
[+] [-] jhowe|11 years ago|reply
1. Note that chip-kill/Extended ECC/Advanced ECC/Chipspare which are all similar server vendor methods for 4-bit correction will prevent this problem. These methods are enabled on the better reliability server systems.
2. This failure mode has been known by the DRAM industry for a couple years now and the newest DRAM parts being produced have this problem solved. The exact solution varies by DRAM vendor. I wish I could go into specifics but I am unaware of any vendor that has stated publicly their fix.
[+] [-] userbinator|11 years ago|reply
How new exactly? The newest tested in the paper is from July 2014 and that still has the problem.
[+] [-] BetaCygni|11 years ago|reply
[+] [-] userbinator|11 years ago|reply
I know there are other parameters of the memory controller that could be changed to cause corruption, e.g. reducing the refresh rate or tweaking the timings, but that is likely to yield random corruptions in normal use instead of this precise one.
To me, the real solution seems to be stop making DRAM with such high density processes until some design changes can be done that make them as reliable as before, because at some point it just stops behaving like real memory anymore and turns into a crude approximation of it; memory should be reliable and store the data it holds, without any corruption regardless of access pattern.
[+] [-] bhouston|11 years ago|reply
[+] [-] im3w1l|11 years ago|reply
https://www.cs.princeton.edu/~appel/papers/memerr.pdf
[+] [-] ajross|11 years ago|reply
[+] [-] Animats|11 years ago|reply
A good first check for security companies - examine all known attacks for fence instructions, which are rare. (Without a fence instruction, hammering on the same addresses will just cycle the caches, and not go out to DRAM.) Look at the code near them for a hammering loop.
This is a promising attack, because it might be able to break through a virtual machine boundary.
A test for this should be shipped with major Linux distros, and run during install. When someone like Amazon, Rackspace, or Google sends back a few thousand machines as rejects, this will get fixed.
[+] [-] swordswinger12|11 years ago|reply
[+] [-] markbnj|11 years ago|reply
[+] [-] mseaborn|11 years ago|reply
Note that for the test to do row hammering effectively, it must pick two addresses that are in different rows but in the same bank. A good way of doing that is just to pick random pairs of addresses. If your machine has 16 banks of DRAM, for example (as various machines I've tested do), there should be a 1/16 chance that the two addresses are in the same bank. This is what the test above does. (Actually, it picks >2 addresses to hammer per iteration.)
Be careful about running the test, because on machines that are susceptible to rowhammer, it could cause bit flips that crash the machine (or worse, bit flips in data that gets written back to disc).
[+] [-] blinkingled|11 years ago|reply
[+] [-] twotwotwo|11 years ago|reply
[+] [-] diydsp|11 years ago|reply
High speed DRAM reads influence nearby cells. Reproduced on 110 of 139 mem modules after 139k reads on intel and amd. 1 in 1.7k cells affected.
[+] [-] phkahler|11 years ago|reply
Second: While the problem looks real enough, the tests to demonstrate it are not realistic. Hammering the same rows with consecutive reads does not happen in the real world due to caches which the get around via flushes. I'd like to see more data on how bad the abuse needs to be to cause the problem. Will 2 reads in a row cause errors? 5? 10? 100? They never address how likely this is to be a real-world problem. I don't doubt that it is, but how often?
[+] [-] xenadu02|11 years ago|reply
Secondly, the DRAM makers don't currently provide enough information to reliably know what neighbors to refresh. I suppose they could have used their guesses to test on the FPGA rig but given the rest of the paper I'm reasonably satisfied that they have correctly identified the problem and that their solution would work.
[+] [-] dchichkov|11 years ago|reply
[+] [-] zanethomas|11 years ago|reply
[+] [-] hammer_test|11 years ago|reply
[+] [-] rebootthesystem|11 years ago|reply
Want to wrap yourself around another challenging aspect of high-speed design? Power distribution system design (PDS). You can design perfect boards based on solid transmission line and RF theory and have them fail to work due to issues such as frequency-dependent impedances and resonance in the PDS.
[+] [-] pera|11 years ago|reply
[+] [-] mud_dauber|11 years ago|reply
[+] [-] kazinator|11 years ago|reply
Why not?
If the attack can only be reproduced by custom hardware, why should anyone care?
Also, precise patterns of access to DRAM would require disabling the L1 and L2 caches. Doesn't that sort of thing require privileged instructions?
With caching in place, memory accesses are indirect. You have to be able to reproduce the attack using only patterns of cache line loads and spills.
[+] [-] kmowery|11 years ago|reply
They evict cache lines using the CLFLUSH x86 instruction, which I believe is unprivileged.
[+] [-] jhallenworld|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]
[+] [-] rab_oof|11 years ago|reply
[+] [-] tsukikage|11 years ago|reply
[+] [-] nitrogen|11 years ago|reply
[+] [-] edwintorok|11 years ago|reply
[+] [-] mud_dauber|11 years ago|reply
There's not that many DRAM manufacturers in the world. Micron, Samsung, Elpida, Hynix are all excellent bets.
[+] [-] jadc|11 years ago|reply
https://news.ycombinator.com/item?id=8713411
[+] [-] ColinWright|11 years ago|reply
Edit: Just checking, it was at 120, looks like it got artificially promoted to 25, then sank quickly.
http://hnrankings.info/8713411/