I think what Intel are doing with Clear Containers is really interesting. They are encapsulating containers inside VMs, avoiding the security problems of containers.
To do this efficiently they've had to make a bunch of changes on the VM side so the overhead is much smaller than an ordinary VM (of the order of 150ms and 20MB of RAM). I've also been looking at this and am hoping to give a talk about it at the KVM Forum in August (http://events.linuxfoundation.org/events/kvm-forum).
Containerization in Linux is fugly. There is no core concept of containers in the kernel, you just have a set of loosely integrated namespaces abused by the likes of lx[cd] and docker.
I don't share your opinion. The Kernel exposes a collection of primatives (including but not limited to: cgroups, namespaces, and copy-on-write storage[1]) which can be used to create isolated sandboxes. The kernel itself doesn't bind the primatives together because I believe Linus would consider that "User space"...and I would agree.
Instead this is left up to other tools like LXC. Also note, that higher level features such as network support are also left up to the higher level tool.
Docker and LXC have core differences in vision of what a container should be [2]. Also, Docker used to be based on LXC, but have since done their own library libcontainer which handles the interaction with the kernel primatives.
To me, Docker's philosophy and libcontainer implementation is...as you say, fugly, but LXC's approach and implementation is not.
I also don't think of the kernel exposing primatives and letting user space tools bind them together as inherently bad. I actually prefer it this way and think it leaves the kernel cleaner/leaner/better off.
Hi Jesse! We got an audit from you last year (I was the one who pushed to get you to specifically look at our PaaS containerization). As a result I spent a while wrangling with Ubuntu LXC unprivileged containers, and now I know more about cgmanager than I wanted to.
Glad to see you've added a lot of detail to your research. It's very necessary!
Very thought-provoking whitepaper. As someone who has been working on securing containers for the past year or so, it gave me some additional avenues to pursue.
>As such, it discloses the names and PIDs of all processes running on the system...
So I don't really see how this is considered a big vulnerability, unless the goal is security by obscurity, but then we could go even further and obfuscate the whole system.
>NET_RAW abuse
Hard to blame LXC/Docker for something that has to do with the configuration of the bridge, plus for some setups this is desired functionality.
>DoS
Some of these are interesting but I don't see how filling up the diskspace is a problem with containers and not operating systems in general, and I feel like a lot of these DoS attacks are all just basic OS limitations but I don't know enough to make an informed statement.
Security by obscurity is relying purely on nondisclosure of information. Minimizing information leakage is sound practice. PIDs, names, etc, can give a lot of information as to the configuration of the app running in your container -- how often external processes are run, potentially vulnerable software that you may be using in utilities, such as an old version of imagemagick, etc. While there's no substitute for keeping your system up to date, frustrating an attacker's ability to get information on your system is also pretty standard practice.
Regarding NET_RAW, this is a case where you want reasonable defaults. Needing raw sockets is an exceptional condition for most container setups, and again, gives a greater threat exposure. Even ignoring the potential for things like ARP spoofing, filling up a MAC table on a lot of switches makes them fail over into being essentially rackmount hubs, which can allow for even greater amounts of service denial and information leakage.
Filling up disk space is an area that is problematic with Linux-based containers because in order to keep a process gone awry, or a malicious process from using up all disk space, you have to do things like set up fixed-sized loopback filesystems ahead of time, which impose performance and space constraints that makes your containers less flexible than containers under Solaris zones, for example. Under ZFS, you can directly configure a container to only be able to use x amount of space, without needing to set up loopback devices or other complexities. This allows you to set up limits, but at the same time, means that if a dataset needs it, you just need to run a single command to give it more space.
Yes, a lot of these issues can be easily mitigated, however, they're all symptoms of poor defaults. A good container system should help manage and mitigate these sorts of issues, so they only need to be thought of once, instead of by everyone implementing them.
Linux does not have containers. It has namespaces and cgroups. Jails (FreeBSD) and zones (Illumos) are containers. Please, stop claiming containers exist on Linux.
[+] [-] rwmj|9 years ago|reply
To do this efficiently they've had to make a bunch of changes on the VM side so the overhead is much smaller than an ordinary VM (of the order of 150ms and 20MB of RAM). I've also been looking at this and am hoping to give a talk about it at the KVM Forum in August (http://events.linuxfoundation.org/events/kvm-forum).
[+] [-] colemickens|9 years ago|reply
[+] [-] unknown|9 years ago|reply
[deleted]
[+] [-] sillysaurus3|9 years ago|reply
[+] [-] geggam|9 years ago|reply
[+] [-] subway|9 years ago|reply
[+] [-] gshulegaard|9 years ago|reply
Instead this is left up to other tools like LXC. Also note, that higher level features such as network support are also left up to the higher level tool.
Docker and LXC have core differences in vision of what a container should be [2]. Also, Docker used to be based on LXC, but have since done their own library libcontainer which handles the interaction with the kernel primatives.
To me, Docker's philosophy and libcontainer implementation is...as you say, fugly, but LXC's approach and implementation is not.
I also don't think of the kernel exposing primatives and letting user space tools bind them together as inherently bad. I actually prefer it this way and think it leaves the kernel cleaner/leaner/better off.
[1] http://www.slideshare.net/jpetazzo/anatomy-of-a-container-na...
[2] https://www.flockport.com/lxc-vs-docker/
[+] [-] zxcvcxz|9 years ago|reply
[+] [-] gtank|9 years ago|reply
[+] [-] mytummyhertz|9 years ago|reply
[+] [-] ak217|9 years ago|reply
Glad to see you've added a lot of detail to your research. It's very necessary!
[+] [-] vishvananda|9 years ago|reply
[+] [-] sbierwagen|9 years ago|reply
[+] [-] zxcvcxz|9 years ago|reply
So I don't really see how this is considered a big vulnerability, unless the goal is security by obscurity, but then we could go even further and obfuscate the whole system.
>NET_RAW abuse
Hard to blame LXC/Docker for something that has to do with the configuration of the bridge, plus for some setups this is desired functionality.
>DoS
Some of these are interesting but I don't see how filling up the diskspace is a problem with containers and not operating systems in general, and I feel like a lot of these DoS attacks are all just basic OS limitations but I don't know enough to make an informed statement.
[+] [-] Sanddancer|9 years ago|reply
Regarding NET_RAW, this is a case where you want reasonable defaults. Needing raw sockets is an exceptional condition for most container setups, and again, gives a greater threat exposure. Even ignoring the potential for things like ARP spoofing, filling up a MAC table on a lot of switches makes them fail over into being essentially rackmount hubs, which can allow for even greater amounts of service denial and information leakage.
Filling up disk space is an area that is problematic with Linux-based containers because in order to keep a process gone awry, or a malicious process from using up all disk space, you have to do things like set up fixed-sized loopback filesystems ahead of time, which impose performance and space constraints that makes your containers less flexible than containers under Solaris zones, for example. Under ZFS, you can directly configure a container to only be able to use x amount of space, without needing to set up loopback devices or other complexities. This allows you to set up limits, but at the same time, means that if a dataset needs it, you just need to run a single command to give it more space.
Yes, a lot of these issues can be easily mitigated, however, they're all symptoms of poor defaults. A good container system should help manage and mitigate these sorts of issues, so they only need to be thought of once, instead of by everyone implementing them.
[+] [-] ck2|9 years ago|reply
Hope there was previous disclosure.
[+] [-] mytummyhertz|9 years ago|reply
[+] [-] X86BSD|9 years ago|reply
[+] [-] cyphar|9 years ago|reply