How we deploy faster with warm Docker containers

[+] rezonant|3 years ago|reply

Whoa, this seems to take for granted that devs targetting serverless infrastructure _must_ deploy up into the actual serverless infrastructure during development, unless I'm misreading? Why is there not a way to simulate the final infrastructure locally, so that development is possible without extremely inefficient pushes into a "dev" serverless deployment?

[+] miohtama|3 years ago|reply

It's called localless (:

[+] peterhunt|3 years ago|reply

I work on Dagster but I'm not the author.

Dagster is OSS and most folks do local development on their laptop which is quite fast. The speedups in this post are for production deployments (i.e. landing a commit in your master branch) and for branch deployments (the deployments that run on every pull request).

[+] tyingq|3 years ago|reply

Most of that issue isn't "serverless" itself, but the other things in the ecosystem it might talk to. A generic python AWS lambda is easy enough to simulate locally with nothing other than python and calling your lambda handler from main with a little boilerplate.

[+] shalabhc|3 years ago|reply

(author here)

You can already do local development with dagster, if you set up a local environment. Some users may not want to set up a local environment with all dependencies, secrets and so on, so they can use the remote environment option. Remote environments are also easy to share with other developers and can be used in addition to local environments.

[+] naikrovek|3 years ago|reply

there are solutions for this. localstack comes to mind, but it is more expensive per user than GitHub enterprise and copilot combined. the localstack devs are very proud of localstack.

[+] lewo|3 years ago|reply

> The key factor behind our decision was the realization that while Docker images are industry standard, moving around 100s of megabytes of images seems unnecessarily heavy-handed when we just need to synchronize a small change.

I think the culprit is more the GitHub Actions cache than Docker since it seems to be hard to get a clean cache management. I'm not sure about caching Docker image layers, but caching the Nix store with GitHub Actions is pretty complicated (not even sure it's possible): this means we have to download all required Nix store paths on each run, but i consider this is because of a GitHub Action cache limitation.

So, did you consider using another CI, which offers better caching mechanisms?

With a CI able to preserve the Nix store (Hydra[1] or Hercules[2] for instance), I think nix2container (author here) could also fit almost all of your requirements ("composability", reproducibility, isolation) and maybe provide better performances because it is able to split your application into several layers [2][3].

Note i'm pretty sure a lot of Docker CI also allows to efficiently build Docker images.

[1] https://hercules-ci.com/

[2] https://grahamc.com/blog/nix-and-layered-docker-images

[3] https://github.com/nlewo/nix2container/blob/85670cab354f7df6...

[+] FBISurveillance|3 years ago|reply

There's been a recent Launch HN of Depot.dev [1] - I've integrated it quickly into my GitHub Actions workflow and it's blazingly fast (13x speedups for me). It also was a drop-in replacement since I was using Docker Bake and Docker Action and Depot mimics that almost fully (except SBOM and provenance bits). It also works with Google Cloud Workload Identity Federation so image pushes to Artifact Registry didn't need any tweaking.

[1] https://news.ycombinator.com/item?id=34898253

Disclaimer: not affiliated, a happy paying customer.

[+] shalabhc|3 years ago|reply

Thanks for the interesting links - I'll check them out! We would need not just another CI but also another container platform because launching a docker container is also slow.

Irrespective of the CI, I believe all cached Docker layers will need to be downloaded onto the build machine before it can be rebuilt.

Still, I believe it is possible to build and deploy faster even with a "docker image only" design and it's something we are still looking at. The question is what is the lower bound here - would be hard to beat "sync a file to a warm container and run it". Pex gives us a pretty good lower bound that is also container platform agnostic.

[+] Aeolun|3 years ago|reply

While a great improvement, 45 seconds to change my serverless code still feels like a lot?

At least I’m comparing it to Cloudflare Workers, which deploys fast enough (<1 sec) that I can actually use it as a development environment.

[+] oofbey|3 years ago|reply

If I had to wait 45 seconds to see a single line of code change run, I would be looking for a new dev environment. Reminds me of compile times in the 1990’s. Serverless is “cool” and all but productivity is what really matters.

[+] shalabhc|3 years ago|reply

(author here)

I agree it would be fantastic to have sub second deploys! Doing this for user specified Python environments is challenging in different ways than doing it for a JS SDK like Workers. Note that just provisioning a GitHub runner takes about 10s, before any deploy code even starts up. In theory we could rsync the code directly to the code server and reload it. But any of these options require bigger architectural changes.

Also you can already have a local environment setup for faster iteration and use this for the shared environments with other developers, where the speedup is still great to have.

[+] moralestapia|3 years ago|reply

CF Workers run straight on top of V8, IIRC? Which I think it was a great design decision. But I wonder how they achieve that w/ the other backends like rust.

[+] IanCal|3 years ago|reply

I'm confused, this feels like a complicated way of creating a docker layer.

Building and hashing the dependencies is exactly what adding the requirements file/etc and building a layer does.

> moving around 100s of megabytes of images seems unnecessarily heavy-handed when we just need to synchronize a small change. Consider git – it only ships the diffs, yet it produces whole and consistent repositories.

Just ship the layers, that's literally what docker does, right?

Is this a whole build process just to get around the ephemeral nature of github actions?

[+] mschuster91|3 years ago|reply

The problem is you can't (easily...) create a layer atop of a Docker image without pulling all of it before.

Basically, a conventional Docker CI build process looks like the following in what I use (where <tag> is something like branch-foobar):

1. docker login <repo>

2. docker pull <repo>/<image>:<tag> || true

3. docker pull <repo>/<image>:branch-master || true

3. docker build --pull --cache-from <repo>/<image>:<tag> --cache-from <repo>/<image>:<master> -f Dockerfile.<branch> -t <repo>/<image>:<tag> .

4. docker push <repo>/<image>:<tag>

Using --cache-from cuts down dramatically on the build times since (assuming your branch-master tag gets rebuilt nightly) at least the Dockerfile layer steps for downloading the base OS image and installing software can be automatically skipped.

But still, even swapping out the last step in the Docker build makes it necessary to pull the entire image first - assuming your average Java application, that's like 500+ MB for OS+Java JRE+OS dependencies. If you're sure that the change is only in the last COPY/ADD step, you still have to pull the image despite it not being needed technically.

[+] nitwit005|3 years ago|reply

It seems strange to use a highly managed deployment environment like Fargate, but then build another deployment tool on top of it to do things in a simpler way.

It feels like EC2 is being reconstructed on a platform meant to hide it.

[+] nijave|3 years ago|reply

Or (dare I say) look into EKS. Kubernetes can spin up containers faster than ECS in my experience (as of ~1 year ago). Seems like the ECS control plane just has more latency (even with EC2 instead of Fargate)

[+] faizshah|3 years ago|reply

Is this approach not just Capistrano but using containers and specific to python? You could just use firecracker microVMs and ansistrano to implement this same workflow but get better isolation from firecracker.

[+] shalabhc|3 years ago|reply

(author here)

Firecracker is definitely very interesting. Would require more ops work for us to run bare metal EC2 (we currently use Fargate). IIUC reusing pre-existing environments would require us to share ext4 filesystems across the VMs. Not sure if antistrano helps here but will look into it.

[+] polyrand|3 years ago|reply

Quite cool to see PEX. I've used a similar package, shiv[0], with great results, and I always wondered why these are not used more. I think Python zipapps are really nice for bundling executables.

[0]: https://shiv.readthedocs.io/en/latest/index.html

[+] 8organicbits|3 years ago|reply

I'm curious about the remaining 20s to start the code on the container. What's the bottleneck there? It seems like you'd need to identify the container, tell it which S3 object has the new source.pex, the container would download it, and then when you run `PEX_PATH=deps.pex ./source.pex` you're up. All that feels like it should take less than 20s. If picking the container takes long, you could probably start that process as sources.pex is built.

[+] shalabhc|3 years ago|reply

Getting the message to the right container is one bottleneck. Currently this is routed through a couple of hops and includes some polling. This could all be optimized (if we had a direct line to the container) but the same messaging model is used in other contexts and would need architectural changes. Another bottleneck is running `source.pex` itself takes a few seconds to start up because it analyzes user code (and in some cases may do expensive computation.) But you're right: if `source.pex` as a hello world program, just downloading and running it should be pretty fast - I'd expect around 1s.

[+] est|3 years ago|reply

I remember once there was a method to use BitTorrent to deploy images/packages in intranet.

[+] hotpotamus|3 years ago|reply

I remembered one called "Murder" which looks like it was a Twitter thing back in the day[0].

Looks like Facebook did it too. I guess there was a phase that tech went through in 2010ish.

[0]https://blog.twitter.com/engineering/en_us/a/2010/murder-fas...

[+] nathants|3 years ago|reply

lambdas update-code api takes less than a second.

using a container instead of a zip for lambda has advantages, but speed is not one of them.

i auto rebuild my go zip and patch aws on every file save.

it’s done before i alt tab, up arrow, and curl.

script: https://github.com/nathants/aws-gocljs/blob/master/bin/dev.s...

[+] bobnamob|3 years ago|reply

Your workflow looks great!

Have you considered going full clojure and using blambda as a backend?

https://github.com/jmglov/blambda

[+] dsunds|3 years ago|reply

AWS has a few projects to reduce launch times for example https://aws.amazon.com/blogs/containers/reducing-aws-fargate... https://aws.amazon.com/about-aws/whats-new/2022/09/introduci...

[+] satyanash|3 years ago|reply

> Consider git – it only ships the diffs, yet it produces whole and consistent repositories.

IIRC git does _not_ ship diffs. It copies whole files even for the tiniest change. The compression layer handles the de-duping, which is a different layer.

[+] slavik81|3 years ago|reply

I would interpret 'ships' as 'pushes', because git does send delta packfiles on the wire.

[+] alfons_foobar|3 years ago|reply

Yup, you are correct - git stores snapshots of files, not diffs.

[+] jayrwren|3 years ago|reply

I don't see how this has anything to do with faster Docker containers. Looks more like faster python distribution.

[+] unknown|3 years ago|reply

[deleted]

60 comments