top | item 31062301

Run end-to-end tests faster with Firecracker

204 points| samanthachai | 4 years ago |webapp.io | reply

59 comments

order
[+] kami8845|4 years ago|reply
A few issues I have with this blog post:

1. It doesn't show off the unique capabilities of firecracker very well.

2. The comparison not very fair.

2a. The docker-build step (which dominates the runtime) is run without any caching, just by adding 2 lines to your build-push-action, "cache-from: type=gha, cache-to: type=gha,mode=max" you can make it a lot faster.

2b. ~1m20s of the time is just "VM start". GitHub Actions has had a rough time recently, but you should never wait that long to get your CI running in day-to-day operation.

2c. The tests are unrealistically short at 20s which allows the author to get to their 10x faster number.

Let's say the GitHub Action starts in 5 seconds, the GitHub Actions cache reduces the build time to 2 minutes and the tests take 10 minutes to run. Now Firecracker is 20% faster ...

You can also get comparable performance out of https://buildkite.com/ which lets you self-host runners on AWS meaning you're almost guaranteed to get a hot docker cache (running against locally attached SSDs). You can now start running your tests (almost) as fast with much more mature tooling.

[+] shoo|4 years ago|reply
> You can also get comparable performance out of https://buildkite.com/ which lets you self-host runners on AWS

you can self-host github runners as well, with a few caveats, the most serious one being that then you are responsible for cleaning up the state of your self-hosted runner between runs

https://docs.github.com/en/actions/hosting-your-own-runners/...

structural isolation guarantees of the form "build execution during run N cannot possibly impact build execution of run N+1" are tremendously helpful -- they reduce the number of weird CI failures and the cost to triage and fix each weird CI failure (by reducing the space of possible interactions). If you cannot offer similar guarantees when self hosting your own CI infrastructure then it may not be wise to self host.

[+] colinchartier|4 years ago|reply
I tried to get docker layer caching working within GHA for a second benchmark, but it seems like none of the approaches work particularly well for a "docker-compose build" - I'd happily amend the post with a second benchmark if you wouldn't mind opening a PR based on the existing one [1]

https://github.com/webappio/livechat-example/blob/be7c9121c1...

The point still stands for 2c - you can super easily parallelize with firecracker (by taking a snapshot of the state right before the test runs, then loading it a bunch of times)

[+] huijzer|4 years ago|reply
I have to mention GitLab here. Their runners are extremely easy to self-host.
[+] CraigJPerry|4 years ago|reply
There’s an even faster strategy than this and it’s easier to setup.

You’re going to deploy 4 CI pipelines (so make sure you’re not manually putting together ci pipelines configs, use automation):

Pipeline 1: A conveyor belt of environments. All this pipeline does is spin up fresh environments then run a short automated smoke test. Hydrate the env with the most recent mask from prod. The trigger condition is there’s less than <Threshold> environments available. I did 8 on a whim and never saw a need to change it.

Pipeline 2: Normal garden variety CI pipeline triggered on merges to main. Output of this will be two artifacts persisted: a built package and your unit test evidence

Pipeline 3: Test your automated deployment by deploying the package build from #2 into the first of the queue of free envs from #1 trigger your end to end and integration and contract tests. Don’t run your security or operability tests here.

Pipeline 4: Async pipeline triggered on a 6hr schedule, do your long running stuff like fuzz testing here, your security tests etc. do these outside of the dev cycle.

Release candidates can only be signed after a successful run through 2, 3 & 4. That means prod deploys are on a predictable cadence which users and ops are usually appreciative of rather than we drop it in when it’s ready.

The DevEx is pretty sweet - you don’t see pipeline 1 or 4 in your build loop. Only the runtime of 3 would be comparable to the article - slightly faster than the article because no firecracker bringup overhead, no matter how small that is.

[+] drjasonharrison|4 years ago|reply
There are times when some corner of software development speaks a specialized language and this is an example.

1. Conveyor belt(?) of environments. Hydrate(?) the env(ironment). Mask(?) from prod(uction)

2. I think I got this. Typical "merge to main pipeline" with built product and test results as outputs.

DevEx(?). And not sure why I wouldn't see pipeline #4 in my build loop because I can't deploy unless 2, 3 and 4 pass.... Maybe you mean I don't wait to see it.

Also not sure how it's faster because environments still need to be brought up. Unless you are trying to say that the environment is already running when the merge to master pipeline succeeds.

[+] ReganLaitila|4 years ago|reply
May I ask what stack you employ to meet these goals?

Many tend to reach for Gitlab CI or Github Actions but these piles of "executable yaml" never appear to be up to the task of complex deployment logic you describe in your post, not including that they don't account for multi-repo or composed artifact workflows naturally. The state of the art, if you can call it that, is Jenkins where you can drop into raw-ish groovy/java for the logic pieces when you need to. But then you run into the constant struggle of working around Jenkin's leaky abstractions and peculiarities.

You can patch together a pile of bash, python, go et al but you land in a worse place where there is no guiding structure to the automation for onboarding, enhancement, and maintenance.

I'm curious of other's experiences building complex build / deployment pipelines where up-front you have consistent entry structure to the automation but have all the escape hatches one would need to implement custom logic when required, in a type safe, potentially compiled, testable way (ie: pipelines as 'actual' code).

Of course one could write their own automation engine to avoid yaml hell and all that. However I am not seeing any pervasive solutions being presented that don't present "yet another (yaml | json | xml | cue | whatever) task dag launching containers running random scripts from wherever".

[+] ithkuil|4 years ago|reply
Firecracker is great and all, but the core idea here described works also with plain docker; i.e. there is nothing inherently firecracker specific to the basic technique
[+] colinchartier|4 years ago|reply
Author here!

The three big differences are:

1. Docker doesn't deal with running processes (like postgres or redis), only the filesystem state

2. Docker doesn't have enough isolation, so you'd probably need to run it within qemu or firecracker for compliance in bigger teams

3. Docker-in-docker is still pretty painful, if you need to do anything nonstandard like change the size of /dev/shm, access /dev/kvm, or load kernel drivers, it'll take custom configuration.

[+] jitl|4 years ago|reply
Yeah, I don’t like that the article itself treats building the DB seed data, etc, into the Firecracker VM image like this is impossible to do in Docker. The techniques are good things to do — but it’s very tenuous how the techniques are connected to Firecracker.

I’ve do all of the above using multi-layered Docker files and a cron CI job to rebuild the base integration test image every 6 hours. Sure if you need the isolation, Firecracker is the way to go. But if you invest primarily in container shenanigans to speed up CI with Docker, it’s not too much extra work to wrap it in a Firecracker VM, plain QEMU, or whatever once you start wanting more isolation.

Also, maybe I’m holding it wrong but Docker in Docker had not bitten us yet on our GitHub action runners.

[+] lmeyerov|4 years ago|reply
Yep, buildkit does incremental builds quite well

We find the dominating factor in (our) incremental builds / CI to be network/io caching, which has less to do with firecracker/docker and more with the surrounding hw/sw (gha topology & smarts, IO speed, ...). It's a real problem in GPU/AI CI where we get monster image sizes. There were some cool blog posts ~last year on caching and routing tricks happening at GH (joint with MSR?), but they've seemingly gone silent..

[+] lgierth|4 years ago|reply
You don't need a management daemon running though, and get a complete virtualized kernel that can be customized if needed.
[+] wyldfire|4 years ago|reply
Gee, why not just go straight to step 3 via fork/exec? Bound to shave off a few milliseconds beyond that 10x. And no firecracker required.
[+] melony|4 years ago|reply
If you a cloud host, you need a way to sandbox hostile code. Firecracker allows you to do that (it is a configuration of the traditional KVM virtualization system except lighter and faster, instead of booting a VPS which can take minutes, you can now spawn one in under a second).
[+] legulere|4 years ago|reply
Because process isolation under unix is pretty lax. Processes have by default have all the rights of the user. And you might end up with a system different from the initial state
[+] rossmohax|4 years ago|reply
They seem to be comparing CI runner starting from scratch to always on VM with firecracker preconfigured.
[+] nicoburns|4 years ago|reply
Firecracker is a CI runner starting a VM for each run in this case, just a more optimised one, no?
[+] forgotusername6|4 years ago|reply
Used to do something similar with vsphere a while back. The servers took ages to get into the right state to test so much easier to just revert to snapshot to get a clean state.
[+] kaivalyagandhi|4 years ago|reply
interesting, I wonder if you can use this with GitHub self hosted runners?
[+] greatgib|4 years ago|reply
Always amaze me to see the new trend of DevOps that will be happily following such a tutorial, wget and running random code from the internet in production...
[+] jrockway|4 years ago|reply
I don't think this is production, this is for running your tests. Your code in the "tests haven't run yet" state probably leak all the secrets they have access to and destroy the machine they're running on, so you don't let them have any secrets and create a new machine each time. "curl | bash" here just injects potential flakiness (as does "npm install" when npm dies, etc.)

Obviously a lot of people treat their CI system as their CD system, and do things like letting tests have highly privileged access to their production k8s cluster. That's a terrible idea even if you aren't installing software with "curl | bash".

So overall, I don't think this is worth a HN comment to complain about. People are going to install software in non-auditable non-reproducible ways.

[+] jen20|4 years ago|reply
It always amazes me that people seem to think this is a _new_ trend.
[+] supermatt|4 years ago|reply
Pretty disingenuous to compare building vs not building. Those firecracker containers need to be built too..

Sure it’s faster startup, but the rest is nonsense.

[+] fideloper|4 years ago|reply
What do y’all run firecracker on? The metal servers on aws (the only servers you can run firecracker on in aws) are pretty expensive!
[+] iampims|4 years ago|reply
DigitalOcean, GCP, Hertzner, Raspberrypi
[+] StreamBright|4 years ago|reply
Great article. Firecracker has been an amazing addition to my toolkit and it is good to see succeeding in solving real world problems.
[+] n8ta|4 years ago|reply
Sounds like having an actual non-ephemeral computer with extra steps...
[+] tedunangst|4 years ago|reply
But why does it require firecracker and not qemu?
[+] colinchartier|4 years ago|reply
QEMU takes much longer to save/restore snapshots, and it's much harder to do via the API
[+] neatze|4 years ago|reply
What does this e2e tests in webapp ?

I don't understand why you need to rebuild docker image every app build, this seems like really wasteful.

[+] n8ta|4 years ago|reply
If the app itself is part of the image you need to rebuild the image every time a dev wants to test their change.
[+] goodpoint|4 years ago|reply
This has been done successfully using VMs since 2 decades.