mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").
In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.
Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.
To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.
At Appcanary, we're thinking about opening up our vulnerability database to be browsable and searchable by the public. If you're not sure which version has the patch for this vulnerability in your distro, here's what we know:
If you wanted to create a useful tool to promote yourselves you could make something for CentOS that allows a user to apply critical security updates only. yum-security doesn't seem to work on CentOS as the repos don't have the correct meta data. Currently that requires a satellite subscription.
It's probably the most serious Linux local privilege escalation ever.
Look, the Azimuth people have forgotten more about reliable exploit development than I have ever known, but, no, as stated, this is clearly not true. Not long ago, pretty much all local privesc bugs were practically 100% reliable.
What I think they mean to say is that this is unusually reliable for a kernel race.
I still think, though, that the right mental model to have regarding Linux privesc bugs is:
1. If there's a local privesc bug with a published exploit, assume it's 100% reliable.
2. In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
You said it: If you are not explicitly on the business of providing external access to your machine, the privesc isn't your problem (it's a problem, and it's bad, though), it's the fact that anybody could exploit the privesc in the first place.
> "In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
It depends. I've seen "oh well if someone has rce they probably have root anyway" used way too many times as an excuse to avoid defense-in-depth measures.
Well another thing to keep in mind with this one in particular is that there is no way to mitigate it. grsecurity can't help with this kind of bug, nothing can so it may not just be about reliability of this exploit but the fact that there's no mitigation other than to update.
I agree. There have been far easier local exploit in the past. For example CVE-2006-2451 whose exploitation was quite simple and not using any race condition. Also CVE-2009-2692 or CVE-2010-3049. Browsing exploit-db makes it easy to find them.
Yup, the best solution here is to make privesc ineffective via VM isolation. Privilege escalations are rampant on most operating systems, they're not worth relying on. VM isolation breaks are much rarer.
> 2. In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
I think this goes for any mainstream OS, Linux is not particularly special here.
>However that's hard to do when the vast majority of kernel bugs come from vendor drivers, not the upstream Linux kernel, Stoep said.
Doesn't this actually validate Andrew Tannenbaum's argument[1] over 25 years ago when he said monolithic operating systems are inherently insecure and a rethink is required.
While it's true that vendor drivers living in kernel space is horrible for security... that's somewhat offtopic here. This particular bug is in the memory management system, which is one of those things that kind of has to be in the kernel. A microkernel architecture seemingly would not have helped in this particular case.
CVE-2016-5195
This flaw allows an attacker with a local system account to
modify on-disk binaries, bypassing the standard permission
mechanisms that would prevent modification without an
appropriate permission set. This is achieved by racing the
madvise(MADV_DONTNEED) system call while having the page of
the executable mmapped in memory.
Excellent example why mounting partition with system binaries (such as /usr) read-only is a good idea. CoreOS does this.
Gotta love the dedication with the Dirty COW "swag" web shop and all. Though something tells to me it's just a strange in-joke. Might be the prices? ($1,000 for a mouse pad .. oh, really?)
Okay, I have no idea what to do. Not a security engineer, can't follow what this thing does but I do have a couple of VPS's running my blog and a few other things. Now maybe there's an argument that I shouldn't be doing this if I don't completely understand all the ins and outs, but what the hell, I like learning about Linux.
So my question is: is simply updating and upgrading enough to protect me from this MOST DANGEROUS BUG EVER IN THE WORLD OH MY GOD YOU'RE GOING TO END UP PART OF A BOTNET AND HURT LITTLE CHILDREN!!1!!1! Which is how this reads to even a semi-technical reader, I mean I know my way around the command line but I'm at a loss as to what to do here.
Since for any serious bug that's published, there's very likely a dozen private or not-yet-found, and also considering on how many networked devices the linux kernel is used, I would really like to see a better upgrade story for Android devices and any other linux-inside gear which doesn't have a distro package manager to apply the fix. As little as I like obstructing tech companies with more laws, especially since most laws don't understand the tech, I feel like laws are the only pressure we can hope for. This is why the abuse of IoT devices is a good thing. It will highlight how dangerous it is to slap a random linux version in some device and never bother with updates. A fleet of smart tvs needs to be hijacked with a stalker trojan that is then used by people to record and later post online private moments of unsuspecting owners of always standby smart tv, amazon echo networked microphones, etc. It's just how the world works before it realize the risks and does something about it.
As an engineer you can argue and plead with management to not release something that you don't intend to provide timely updates with a well-communicated support time. Like a 2 year warranty that's prominently communicated, this would highlight to consumers that it's unsafe to use the device unless disconnected from the network. Just like a car that doesn't pass your local safety regulations is not allowed into public traffic.
Actually, I'm surprised modern cars do not require periodic zero-expenses-for-the-owner software updates at licensed dealerships. You can explain to a driver that tires go bad because they drove X miles and have to be paid for, but you cannot argue that software updates need to be paid for because from the time they bought it Y days have passed. Take the Samsung battery optimization that went wrong, where the separation layer was a tiny bit too shallow. It's fair to assume some regulation will follow for safety purposes. Similarly, networked devices, which are not (and cannot be?) microcontrollers with mere 500 lines of code, have to be regulated in terms of software updates.
Now you may say the industry will go broke if they're required to provide upgrades, or less devices will be made, but I think this will lead to consolidation of the software stack, which is mostly a good thing, as those who want to produce dozens of cheap IoT devices can do so without hiring kernel developers. It's like other industries where cheap toy makers source materials like plastic from vendors, knowing it's safe, or create the materials following a detailed recipe which is certified.
Can someone help me better understand how this works, or perhaps point me to a decent article explaining more of the details? Most of the articles I can find just briefly explain the exploit, but not really how it works (in detail).
From looking at the example code, it seems like the general process is:
- Open some (normally un-writable) file as read-only and mmap it in to your process.
- Kick off two threads. One thread to repeatedly write to the same mmap-ed address via /proc/PID/mem and another thread to keep issuing the madvise call.
- Wait for some race condition to be (un)satisfied such that you're able to write to a cached copy of the file.
What I don’t fully understand is how the /proc/PID/mem thing works.
Here’s what I’m curious about:
1. What would happen if you tried to write to the mmap-ed region directly? Since it’s been mapped in with “PROT_READ”, does this mean that you’ll get a segmentation fault or something? From the manpage, it seems like “MAP_PRIVATE” allows it to be a COW mapping, but I don’t see how the combination of “PROT_READ” and “MAP_PRIVATE” is even valid. Unless this means that any writes to data copied from the mmap-ed region into other buffers will be COW-ed and that you can’t actually write to the mmap-ed region itself? That would make sense to me.
2.How is writing to /proc/PID/mem any different than writing through the mmap-ed region directly? Assume that you weren’t running the madvice thread. What would happen then if you tried to write to the /proc/PID/mem file? Presumably the same thing that happens if you just tried to write to the file directly…
3. Finally, how does the madvice call cause a race condition? I realize this might be a little too much to cover in a comment, but this seems like the meat of it.
Doesn't seem like it works on a $10 DigitalOcean droplet (1 vCPU) with grsec-patched 4.4.8. After running for quite some time (which I suspect a system administrator would notice) "cat foo" still outputs the same contents.
If I'm reading this correctly it works only when there's already access to a user account on the system. So you need to have an existing vulnerability already [eg an untrusted user].
Interesting whether it will give new root exploits for Android as suggested in the comments.
If one's running an LTS version of Ubuntu like 14.04 or 16.04, can one can expect to get an update with the security patch for this?
I'm running Kubuntu 14.04 with the latest security updates, and I'm still on kernel version 3.13.0-98-generic.
~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
~ $ uname -a
Linux anon-pc 3.13.0-98-generic #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
No idea why I haven't gotten an update to 4.x. Should I just switch to a rolling release distro like Arch to have the latest updates of everything?
The github page [0] states that "The In The Wild exploit relied on using ptrace."
Now, I'm wondering what purpose ptrace serves, aside from debuggers? Why don't we just disable this by default on production systems (where you shouldn't be debugging anyhow)?
> production systems (where you shouldn't be debugging anyhow)
I'm not sure about this. Ideally, yes, but if you don't know what's causing an issue it can be difficult to reproduce it, and strace can be phenomenally helpful in figuring out the cause. Of course, you could leave it off until you think you might be in such a situation.
There are a surprising number of users for ptrace. E.g. upstart uses it to count forks (presumably to mitigate fork bombs), as geofft has pointed out above.
See the SELinux boolean "deny_ptrace", and/or the sysctl "kernel.yama.ptrace_scope", and have at it.
It's not just for debugging, but for any tool that needs some measure of process control. Probably the next most common ptrace-caller I know is "strace".
So the escalation is rw access to privileged files, are LXC and Docker container breakouts prevented then? Also does /proc access through lxcfs or Docker's handling of /proc make any difference?
[+] [-] the_duke|9 years ago|reply
commit 89eeba1594ac641a30b91942961e80fae978f839 Author: Linus Torvalds <[email protected]> Date: Thu Oct 13 13:07:36 2016 -0700
[+] [-] ontoillogical|9 years ago|reply
Ubuntu - https://appcanary.com/vulns/45984
Debian - https://appcanary.com/vulns/45983
Amazon Linux - https://appcanary.com/vulns/45992
Centos - no patch yet
If you found this useful, please let me know!
[+] [-] pmuk|9 years ago|reply
[+] [-] JonRB|9 years ago|reply
I'll admit I spent a bit of time on your homepage thinking "the use of those birds are a bit twitter". Then I realised you're called App Canary.
[+] [-] ndesaulniers|9 years ago|reply
edit: https://appcanary.com/vulns made my browser crawl. Please fix that page.
[+] [-] orblivion|9 years ago|reply
[+] [-] mfukar|9 years ago|reply
They really need a lot of work.
[+] [-] rkv|9 years ago|reply
[+] [-] tptacek|9 years ago|reply
Look, the Azimuth people have forgotten more about reliable exploit development than I have ever known, but, no, as stated, this is clearly not true. Not long ago, pretty much all local privesc bugs were practically 100% reliable.
What I think they mean to say is that this is unusually reliable for a kernel race.
I still think, though, that the right mental model to have regarding Linux privesc bugs is:
1. If there's a local privesc bug with a published exploit, assume it's 100% reliable.
2. In almost all cases, whether or not there's a known local privesc bug, assume that code execution on your Linux systems equates to privesc; this is doubly true of machines in your prod deployment environment.
[+] [-] qwertyuiop924|9 years ago|reply
[+] [-] startling|9 years ago|reply
It depends. I've seen "oh well if someone has rce they probably have root anyway" used way too many times as an excuse to avoid defense-in-depth measures.
[+] [-] hueving|9 years ago|reply
Tell this to the container community. They would have you believe containers are as secure as VMs.
[+] [-] ryuuchin|9 years ago|reply
[+] [-] vbernat|9 years ago|reply
[+] [-] problems|9 years ago|reply
[+] [-] tcoppi|9 years ago|reply
I think this goes for any mainstream OS, Linux is not particularly special here.
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] drieddust|9 years ago|reply
Doesn't this actually validate Andrew Tannenbaum's argument[1] over 25 years ago when he said monolithic operating systems are inherently insecure and a rethink is required.
[1] https://groups.google.com/forum/m/?fromgroups#!topic/comp.os...
[+] [-] kentonv|9 years ago|reply
While it's true that vendor drivers living in kernel space is horrible for security... that's somewhat offtopic here. This particular bug is in the memory management system, which is one of those things that kind of has to be in the kernel. A microkernel architecture seemingly would not have helped in this particular case.
[+] [-] aexaey|9 years ago|reply
[EDIT] added "read-only"
[+] [-] saidajigumi|9 years ago|reply
http://dirtycow.ninja/
[+] [-] vesinisa|9 years ago|reply
[+] [-] khy|9 years ago|reply
[+] [-] escapologybb|9 years ago|reply
So my question is: is simply updating and upgrading enough to protect me from this MOST DANGEROUS BUG EVER IN THE WORLD OH MY GOD YOU'RE GOING TO END UP PART OF A BOTNET AND HURT LITTLE CHILDREN!!1!!1! Which is how this reads to even a semi-technical reader, I mean I know my way around the command line but I'm at a loss as to what to do here.
Help me out HN please!
[+] [-] cm3|9 years ago|reply
As an engineer you can argue and plead with management to not release something that you don't intend to provide timely updates with a well-communicated support time. Like a 2 year warranty that's prominently communicated, this would highlight to consumers that it's unsafe to use the device unless disconnected from the network. Just like a car that doesn't pass your local safety regulations is not allowed into public traffic.
Actually, I'm surprised modern cars do not require periodic zero-expenses-for-the-owner software updates at licensed dealerships. You can explain to a driver that tires go bad because they drove X miles and have to be paid for, but you cannot argue that software updates need to be paid for because from the time they bought it Y days have passed. Take the Samsung battery optimization that went wrong, where the separation layer was a tiny bit too shallow. It's fair to assume some regulation will follow for safety purposes. Similarly, networked devices, which are not (and cannot be?) microcontrollers with mere 500 lines of code, have to be regulated in terms of software updates.
Now you may say the industry will go broke if they're required to provide upgrades, or less devices will be made, but I think this will lead to consolidation of the software stack, which is mostly a good thing, as those who want to produce dozens of cheap IoT devices can do so without hiring kernel developers. It's like other industries where cheap toy makers source materials like plastic from vendors, knowing it's safe, or create the materials following a detailed recipe which is certified.
[+] [-] cheiVia0|9 years ago|reply
[+] [-] joelthelion|9 years ago|reply
[+] [-] Unklejoe|9 years ago|reply
From looking at the example code, it seems like the general process is:
- Open some (normally un-writable) file as read-only and mmap it in to your process.
- Kick off two threads. One thread to repeatedly write to the same mmap-ed address via /proc/PID/mem and another thread to keep issuing the madvise call.
- Wait for some race condition to be (un)satisfied such that you're able to write to a cached copy of the file.
What I don’t fully understand is how the /proc/PID/mem thing works.
Here’s what I’m curious about:
1. What would happen if you tried to write to the mmap-ed region directly? Since it’s been mapped in with “PROT_READ”, does this mean that you’ll get a segmentation fault or something? From the manpage, it seems like “MAP_PRIVATE” allows it to be a COW mapping, but I don’t see how the combination of “PROT_READ” and “MAP_PRIVATE” is even valid. Unless this means that any writes to data copied from the mmap-ed region into other buffers will be COW-ed and that you can’t actually write to the mmap-ed region itself? That would make sense to me.
2.How is writing to /proc/PID/mem any different than writing through the mmap-ed region directly? Assume that you weren’t running the madvice thread. What would happen then if you tried to write to the /proc/PID/mem file? Presumably the same thing that happens if you just tried to write to the file directly…
3. Finally, how does the madvice call cause a race condition? I realize this might be a little too much to cover in a comment, but this seems like the meat of it.
[+] [-] kordless|9 years ago|reply
[+] [-] i336_|9 years ago|reply
[+] [-] antocv|9 years ago|reply
[+] [-] Hello71|9 years ago|reply
[+] [-] lima|9 years ago|reply
[+] [-] i336_|9 years ago|reply
[+] [-] coldpie|9 years ago|reply
[+] [-] amscanne|9 years ago|reply
[+] [-] AznHisoka|9 years ago|reply
If only privileged users can SSH into my server, does this really affect me? In other words, I already allow only SSH users to become root.
[+] [-] pbhjpbhj|9 years ago|reply
Interesting whether it will give new root exploits for Android as suggested in the comments.
[+] [-] alltakendamned|9 years ago|reply
[+] [-] winter_blue|9 years ago|reply
I'm running Kubuntu 14.04 with the latest security updates, and I'm still on kernel version 3.13.0-98-generic.
No idea why I haven't gotten an update to 4.x. Should I just switch to a rolling release distro like Arch to have the latest updates of everything?[+] [-] forbiddenlake|9 years ago|reply
https://www.ubuntu.com/usn/usn-3105-1/ http://people.canonical.com/~ubuntu-security/cve/2016/CVE-20...
[+] [-] frederikvs|9 years ago|reply
[0] https://github.com/dirtycow/dirtycow.github.io/wiki/Vulnerab...
[+] [-] dllthomas|9 years ago|reply
I'm not sure about this. Ideally, yes, but if you don't know what's causing an issue it can be difficult to reproduce it, and strace can be phenomenally helpful in figuring out the cause. Of course, you could leave it off until you think you might be in such a situation.
[+] [-] aexaey|9 years ago|reply
[+] [-] CUViper|9 years ago|reply
It's not just for debugging, but for any tool that needs some measure of process control. Probably the next most common ptrace-caller I know is "strace".
[+] [-] 100ideas|9 years ago|reply
[+] [-] fulafel|9 years ago|reply
[+] [-] ndesaulniers|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]