top | item 38853256

(no title)

> Container vulnerabilities are rarely related to memory bugs.

The easiest way to escape a container is through exploitation of the Linux kernel via a memory safety issue.

> C-level memory stuff is absolutely NOT the reason why virtualization is safer

Yes it is. The point of a VM is that you can remove the kernel as a trust boundary because the kernel is not capable of enforcing that boundary because of memory safety issues.

> but there's a lot of magical thinking around the word "safe"

There's no magical thinking on my part. I'm quite familiar with exploitation of the Linux kernel, container security, and VM security.

> the majority of CVEs I've seen in my career are not things that Rust would have prevented.

I don't know what your point is here. Do you spend a lot of time in your career thinking about hardening your containers against kernel CVEs?

discuss

t8sr|2 years ago

> I don't know what your point is here. Do you spend a lot of time in your career thinking about hardening your containers against kernel CVEs?

Yes, I literally led a team of people at a FAANG doing this.

You're saying the easiest way to escape a container is a vulnerability normally priced over 1 million USD. I'm saying the easiest way is through one of the million side channels.

insanitybit|2 years ago

OK, I apologize if I was coming off as glib or condescending. I will take your input into consideration.

I'm not looking to argue, I was just annoyed that I was getting so many of the same comments. It's too early for all of this negativity.

If you want to discuss this via an avenue that is not HN I would be open to it, I'm not looking to make enemies here, I'd rather have an earnest conversation with a colleague rather than jumping down their throats because they caught me in the middle of an annoying conversation.

unknown|2 years ago

[deleted]

xorcist|2 years ago

This discussion has been had a thousand times over back when people said "chroot is not security boundary". Now people say "containers are not a security boundary", but they mean essentially the same thing.

The thing is, chroots are pretty secure, if you know what you're doing. As long as you run each process as a dedicated uid, with readonly filesystems, without access to /proc or /dev, bar any kernel exploit you should be safe.

The know what you're doing part was where the problems arose. And that's why chroot was considered insecure in practice. People generally put whole Linux installations in chroots, complete with bind mounts or suid binaries. Either way could be a way to get open file handles outside your filesystem, which would make any namespaces a useless spectacle.

Containers are like that. I've seen people doing all sorts of crazy bind mounts, leaving the docker socket accessible, sharing filesystems, or running processes as root.

The kernel exploits are something else, they exist too, and something you at least in theory would patch after they get known. But the sidechannels are a hundred times more prevalent, in any containerized workload that I've seen.

Most kernel exploits are also related to device drivers or file systems, and are often written by third parties. Microkernels were said to contain those by running most of them as processes. That's a good idea, at least in theory. In practice it's tricky because you are dealing with buggy hardware that has DMA access. Any mismatch between a driver and a hardware state risks a system hang, data loss, or security exploit.

nonameiguess|2 years ago

You two seem to have figured this out, but as far as I can tell, the disconnect here is that the vast majority of security issues related to the separation difference between VMs and containers isn't due to container "escapes" at all. It's due to the defaults of the application you're running assuming it's the only software on the system and it can run with any and all privileges. Lazy developers don't give you containers that work without running as privileged and demand from users to use that application after migrating from a primarily VM-based IT infrastructure to a primarily container-based one is too great to simply tell them no, and if it's free software, you have no ability to tell the developers to do anything differently.

Discussions on Hacker News understandably lean toward the concerns of application developers and especially greenfield projects run by startups who can take complete control of the full stack if they want to. But running applications using resources partially shared by other applications encompasses a hell of a lot of other scenarios. Think some bank or military department that has to self-host ADP, Rocket Chat, a Git server, M365, and whatever other hundreds of company-wide collaboration tooling the employees need. Do you do it on VMs or containers? If the application in question inherently assumes it is running on its own server as root, the answer to that question doesn't really depend on kernel CVEs potentially allowing for container escapes.

If we're just reasoning from first principles, applications in containers on the same host OS share more of a common attack surface than applications in VMs on the same physical host, and those share more than applications running on separate servers in the same rack, which in turn share more than servers in separate racks, which in turn share more than servers in separate data centers. The potential layers of separation can be nearly endless, but there is a natural hierachy on which containers will always sit below VMs, regardless of the kernel providing the containers.

Even putting that aside, if we're going to frame a choice here, these are not exactly kernels on equal footing. A kernel written in C that has existed for nearly four decades and is used on probably trillions of devices by everything from hobbyists to militaries to Fortune 50 companies to hospitals to physics labs is very likely to be safer on any realistic scale compared to a kernel written in Rust by one college student in his spare time that is tested on Qemu. The developer himself tells you don't use this in production.

I think the annoyance here is it often feels when reading Hacker News that a lot of users treat static typing and borrow checking like it's magic and automatically guarantees a security advantage. Imagine we lived in the Marvel Multiverse and vibranium was real. It might provide a substrate with which it is possible to create stronger structures than real metals, but does that mean you'd rather fly in an aircraft constructed by Riri Williams when she is 17 that she built in her parents' garage or would you rather trust Boeing and whatever plain-ass alloy with all its physical flaws they put into a 747? Maybe it's a bad analogy because vibranium pretty much is magic but there is no magic in the real world.