Execute Docker Containers as QEMU MicroVMs

riobard|4 years ago

A few years ago I invested in a small startup called `hyper.sh`. It open sourced a container runtime called `runV` which provided exactly this: security of virtual machines plus convenience of containers.

The project later merged with Intel Clear Container to become what's now called Kata Containers (https://katacontainers.io/) and is now widely used by several Internet giants like Alibaba and Baidu.

The startup was acquired by Ant Finance a couple of years ago.

(I recorded a podcast with one of hyper.sh engineer if you can listen to Mandarin https://pan.icu/25)

temp_praneshp|4 years ago

Probably off topic: Back in 2014-15 at my first job, when I was working on openstack, they used to show up at the summits. They were super smart and very generous with their time when I had questions. I wondered sometime in 2020 what happened to them, I'm happy they had a decent exit.

XorNot|4 years ago

I used runV with drone.io (on top of Media) to run distributed on-demand VM builders for GitHub enterprise (we were building physical machine images to deploy so needed VM isolation).

It actually worked great, and I've struggled to get as quite a flexible CI system at other jobs since then (the big advantage was it looked like Docker, so with compose you could either spin a metal-like nested VM or just pull in some DB containers in your build instance).

cptnapalm|4 years ago

I was looking at Kata containers a few days ago. I'm pretty new to trying to use VMs/containers for services; purely hobby level. Couldn't figure out how to use them, but that's not necessarily a knock on them as I also can't get OpenBSD wireguard to work either.

polskibus|4 years ago

How does it differ from Firecracker?

unknown|4 years ago

[deleted]

lifty|4 years ago

I worked with their tech, testing it, and I loved the product. It was definitely ahead of its time. Similar in some ways to what Fly is doing these days, without the edge.

eatonphil|4 years ago

There are a few existing projects out there like this (running Docker images as virtual machines, specifically) if folks are interested. Slim [0] is the one I can remember off the top of my head. I think there are a couple more.

Still, neat to have the walkthrough here in this post.

https://github.com/ottomatica/slim

hardwaresofton|4 years ago

A couple more:

https://github.com/containers/krunvm

https://github.com/weaveworks/ignite

tptacek|4 years ago

As I understand the landscape here, the big enabling win of microvms is faster boot time; there's a cool qemu-lite slide deck that goes into detail about how they cut down boot time:

https://www.linux-kvm.org/images/d/d2/03x05B-Chao_Peng-Light...

The big win was slashing away the BIOS stuff.

We use AWS's Firecracker to turn our customers Docker containers into Firecracker microvms (Firecracker is Amazon's Rust VMM, the engine for Fargate and Lambda). Anecdotally: in my dev environment, the difference between Firecracker boot times and native Docker container startup is imperceptible; the logging we do swamps the VM boot stuff. It's very fast.

rwmj|4 years ago

https://katacontainers.io/ ?

bonzini|4 years ago

Yes, indeed. However it's nice to see directly the mechanisms that let Kata do its magic.

ashishbijlani|4 years ago

> Can we somehow combine the advantages of the docker ecosystem with VMs?

Shameless plug: this is exactly what our goal is with https://kwarantine.xyz We are creating a new hypervisor (from scratch) that can run strongly isolated Docker/LXC containers.

amscanne|4 years ago

The "fork" sounds like you blue pill the OS for each container? I'm assuming the concept is like Cappsule [1] or Bromium [2]?

[1] https://cappsule.github.io/ [2] https://en.wikipedia.org/wiki/Bromium#/media/File:Bromium-en...

mikepurvis|4 years ago

Is this what gvisor is? https://github.com/google/gvisor

stefanha|4 years ago

For an even more lightweight approach to running containers in VMs see: https://github.com/containers/krunvm

It's powered by https://github.com/containers/libkrun.

forty|4 years ago

Isn't firecracker an AWS tech?

bhawks|4 years ago

If you're splitting hairs firecracker (aws) is an offshoot of crosvm from chrome/Google which actually was a greenfield vmm :) anyway memory safe virtualization for the win.

cpach|4 years ago

That’s correct.

https://github.com/firecracker-microvm/firecracker

jjacobson93|4 years ago

Yeah, the author is incorrect. Fly.io uses Firecracker but they didn’t create it.

thekevjames|4 years ago

I had fun exploring Docker->VM conversion a while back [1], though the larger goal in my case was to be able to make the build path to custom GCP VM Images a bit simpler. Exciting to see other cases where folks are finding this sort of flow useful!

1: https://thekev.in/blog/2019-08-05-dockerfile-bootable-vm/ind...

dzonga|4 years ago

I understand, it's cool to do content marketing. but folks proof-read your articles. Firecracker was created by AWS and rightly states so on the page.

OldGoodNewBad|4 years ago

I think a lot of folks are going out of their way to misunderstand what happened. Yes there are other similar projects and containers. No, none come from a long established COMMUNITY RUN PROJECT. This is something akin to the difference between VirtualBox and OpenBSD’s vmd. Ones a product with a “free” tier, the other is a community project.

gravypod|4 years ago

Something I'd be very interested in: building a PXE image from something declarative like Dockerfiles.

justincormack|4 years ago

Try LinuxKit https://github.com/linuxkit/linuxkit

laurencerowe|4 years ago

Google Container Optimized OS is basically this I think. It's what's used when you start a GCE instance with a docker image.

https://cloud.google.com/container-optimized-os/

jonjonsonjr|4 years ago

I don't think I'd ever call a Dockerfile declarative.

unknown|4 years ago

[deleted]

encryptluks2|4 years ago

Why not run containers in VMs in containers in VMs? :)

Seriously, VMs are hardly as secure as many people want to believe unless you're utilizing enclaves and even that has vulnerabilities. I think a better approach is Seccomp and whatever other filtering makes sense.

handrous|4 years ago

A while back I did some looking at FreeBSD jails to try to figure out why they don't have more mindshare (especially when paired with the nigh-superpower-granting ZFS).

I came away baffled that they weren't more widely-promoted, compared with Docker and friends. After thinking about it for a while, all I can figure is they're so straightforward to use and well-documented that there's no room to make one's name, or to make a buck, re-packaging them or wrapping them in complex tools, so there's little money or glory (= personal marketing via open-source project leadership/contributions) in promoting them.

[EDIT] that is: what would be a blog post in LXC/Docker land... doesn't exist, because it's covered perfectly well in the docs. What would be a simple open-source tool... becomes a blog post, because it's short, simple, and clear enough not to merit special software, but just a quick guide to existing tools. What would be a business, becomes a simple open-source tool without enough of a difficulty/convenience "moat" to support a business.

tptacek|4 years ago

I don't know what people generally believe.

But the attack surface of a Linux kernel is very large, is pretty unpredictable, and can't be coherently masked out with rules (my favorite example Jann Horn's VM reference count bug, which was a simple concurrency flaw in the core virtual memory system). By comparison, a Linux KVM hypervisor is not just a subset of the kernel by definition, but also a much smaller codebase, a tiny fraction of the whole kernel.

Replacing shared-kernel isolation like seccomp-filtered containers with VMs is, architecturally, simply the replacement of a large trusted computing base with a smaller one. If the overhead is acceptable, it's hard to argue with from a security perspective.

gorkish|4 years ago

OK; https://github.com/harvester/harvester

Security and performance aren't the only driving forces; there are a lot of technical and operational benefits to the abstraction and standard interfaces that you get when running stacks that might otherwise look like someone took an Xzibit meme too far.

Also remember on a modern system, there are often at least 2 additional layers at work abstracting interfaces to the "bare metal" OS already.

riobard|4 years ago

That's the approach taken by Google's gVisor (at the cost of I/O and network performance).

dboreham|4 years ago

Machine Turducken.

63 comments