Show HN: Sub-millisecond VM sandboxes using CoW memory forking
310 points| adammiribyan | 12 days ago |github.com | reply
So instead of launching a new microVM per execution, I boot Firecracker once with Python and numpy already loaded, then snapshot the full VM state. Every execution after that creates a new KVM VM backed by a `MAP_PRIVATE` mapping of the snapshot memory, so Linux gives me copy-on-write pages automatically.
That means each sandbox starts from an already-running Python process inside a real VM, runs the code, and exits.
These are real KVM VMs, not containers: separate guest kernel, separate guest memory, separate page tables. When a VM writes to memory, it gets a private copy of that page.
The hard part was not CoW itself. The hard part was resuming the snapshotted VM correctly.
Rust, Apache 2.0.
[+] [-] cperciva|12 days ago|reply
The firecracker team wrote a very good paper about addressing this when they added snapshot support.
[+] [-] adammiribyan|12 days ago|reply
[+] [-] Retr0id|12 days ago|reply
[+] [-] injidup|12 days ago|reply
[+] [-] benterix|12 days ago|reply
[+] [-] CTDOCodebases|12 days ago|reply
[+] [-] BornaP|11 days ago|reply
If you're building agents on Windows and want to give it a spin, reach out for early access.
[+] [-] ddtaylor|11 days ago|reply
[+] [-] RyleHisk|10 days ago|reply
[+] [-] ivan_burazin|11 days ago|reply
[+] [-] BornaP|11 days ago|reply
The tricky part we keep running into with agent sandboxes is that code execution is just one piece, bcs agents also need file system access, memory, git, a pty, and a bunch of other tools all wired up and isolated together. That's where things get hairy fast.
[+] [-] unknown|7 days ago|reply
[deleted]
[+] [-] jamiemallers|10 days ago|reply
[deleted]
[+] [-] crawshaw|12 days ago|reply
That said, I have seen several use cases where people want a VM for something minimal, like a python interpreter, and this is absolutely the sort of approach they should be using. Lot of promise here, excited to see how far you can push it!
[+] [-] hrmtst93837|12 days ago|reply
Fast startup is nice. If the workload is "run plain Python on a trusted codebase" you win, but once it gets hairier the maintenance overhead sends you straight back to yak shaving.
[+] [-] indigodaddy|12 days ago|reply
[+] [-] skwuwu|12 days ago|reply
[+] [-] adammiribyan|12 days ago|reply
[+] [-] vmg12|12 days ago|reply
More than the sub ms startup time the 258kb of ram per VM is huge.
[+] [-] adammiribyan|12 days ago|reply
On Firecracker version: tested with v1.12, but the vmstate parser auto-detects offsets rather than hardcoding them, so it should work across versions.
[+] [-] buckle8017|12 days ago|reply
[+] [-] hnperu5|12 days ago|reply
[deleted]
[+] [-] indigodaddy|12 days ago|reply
https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-s...
Are there parallels?
[+] [-] CompuIves|11 days ago|reply
The first version we launched used the exact same approach (MAP_PRIVATE). Later on, we bypassed the file system by using shared memory and using userfaultfd because ultimately the NVMe became the bottleneck (https://codesandbox.io/blog/cloning-microvms-using-userfault... and https://codesandbox.io/blog/how-we-scale-our-microvm-infrast...).
[+] [-] deivid|12 days ago|reply
[+] [-] jauntywundrkind|12 days ago|reply
I keep thinking I need to see if CRIU (checkpoint restore in userspace) is going to work here. So I can put work down for longer time, be able to close & restore instances sort of on-demand.
I don't really love the idea of using VMs more, but I super love this project. Heck yes forking our processes/VMs.
[+] [-] indigodaddy|12 days ago|reply
https://GitHub.com/jgbrwn/vibebin
[+] [-] adammiribyan|12 days ago|reply
[+] [-] aftbit|11 days ago|reply
[+] [-] adammiribyan|11 days ago|reply
[+] [-] indigodaddy|12 days ago|reply
[+] [-] dizhn|12 days ago|reply
[+] [-] aa-jv|11 days ago|reply
Now I want the ability to freeze the VM cryogenically and move it to another machine automagically, defrosting and running as seamlessly as possible.
I know this is gonna happen soon enough, I've been waiting since the death of TandemOS for just this feature ..
[+] [-] diptanu|12 days ago|reply
[+] [-] Rygian|12 days ago|reply
[+] [-] indigodaddy|12 days ago|reply
https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-s...
[+] [-] adammiribyan|12 days ago|reply
[+] [-] unknown|12 days ago|reply
[deleted]
[+] [-] latortuga|12 days ago|reply
[+] [-] polskibus|11 days ago|reply
[+] [-] yagizdagabak|12 days ago|reply
[+] [-] adammiribyan|12 days ago|reply
[+] [-] adammiribyan|12 days ago|reply
[+] [-] huksley|11 days ago|reply
[+] [-] wang_cong|10 days ago|reply
[+] [-] adammiribyan|9 days ago|reply