Run Nix Based Environments in Kubernetes

ronef|3 months ago

Ron from Flox here, woke up to feed a brand new 3 day old to see this here! On about 3 hours of sleep (over the lat 48 hours) but excited to try and answer some questions! Feel free to also drop any below <3

We did just launch this last week after a good bit of work from the team. Steve wrote up a deeper technical dive here if anyone is interested - https://flox.dev/blog/kubernetes-uncontained-explained-unloc...

wathef|3 months ago

congrats on the little one, here’s to many wonderful moments.

whazor|3 months ago

When I worked on an enterprise data analytics platform, a big problem was docker image growth. People were using different python versions, different cuda versions, all kinds of libraries. With Cuda being over a gigabyte, this all explodes.

The solution is to decompose the docker images and make sure that every layer is hash equivalent. So if people update their Cuda version, it result in a change within the Python layers.

But it looks like Flox now simplifies this via Nix. Every Nix package already has a hash and you can combine packages however you would like.

__MatrixMan__|3 months ago

I was an early and enthusiastic adopter of docker. I really liked how it would let me use layers to keep track of dependency between files.

After spending a few years using nix, the docker image situation looks pretty bonkers. If two files end up in separate layers, the system assumes dependency so if the lower file changes you need to build a separate copy of the higher one just in case there's actual dependency there.

Within nix you can be more precise about what depends on what, which is nice, but you do have to be thoughtful about it or you can summon the same footgun that got you with docker, just in smaller form. Because a nix derivation, while a box with nicely labeled inputs and output, is still a black box. If you insert a readme as an input to a derivation that does a build, nix will assume that the compiled binary depends on it and when you fix a typo in the readme and rebuild you'll end up with a duplicate binary build in the nix store despite the contents of the binary not actually depending on the text of the readme.

> you can combine packages however you would like

So this is true, more or less, but be aware that while nix lets you do this in ways that don't force needless duplication, it doesn't force you to avoid that duplication. Things carelessly packaged with nix can easily recreate the problem you mentioned with docker.

justincormack|3 months ago

Yes, there were various attempts to do this in the container ecosystem, but there is a hard limit on layers on Docker images (because there are hard limits on overlay mounts; you don't really need to overlay all the Nix store mounts of course as they have different paths but the code is for teh geenral case). So then there were various ways of bundling sets of packages into layers, but just managing it directly through Nix store is much simpler.

ronef|3 months ago

Yes, this hits the nail on the head. We’ve seen the same explosion in image size and rebuild complexity, especially with AI/ML workloads where Python + CUDA + random pip wheels + system libs = image bloat and massive rebuilds.

With the Kubernetes shim, you can run the hash-pinned environments without building or pulling an image at all. It starts the pod with a stub, then activates the exact runtime from a node-local store.

rootnod3|3 months ago

I used to love both, Kubernetes and Nix. But after a few years of using both I felt like the abstraction levels are a bit too deep.

Sure, it's easy to stand up a mail server in NixOS, or to just use docker/kubernetes to deploy stuff. But after a few years it felt like I don't have a single understanding of the stack. When shit hits the fan, it makes it very difficult to troubleshoot.

I am now back on running my servers on FreeBSD/OpenBSD and jails or VMM respectively. And also dumbing the stack down to just "run it in a jail, but set it up manually".

The only outlier is Immich. For some reason they only officially support the docker images but not a single clear instruction on how to set it up manually. Sure, I could look at the Dockerfiles, but many of the scripts also expect docker to be present.

And now that FreeBSD also has reproducible builds, it took one more stone away from Nix.

ronef|3 months ago

Going to sound weird but with both my hats on I super appreciate this perspective. I can only speak to some areas of Nix and Flox obviously and I know folks are looking into doing this to your point a whole lot better. Zooming in way more into solving for us that just want to run and fix it fast when it breaks.

Also, think it's a huge ecosystem win for FreeBSD pushing on reproducibility too. I think we are trending in a direction where this just becomes a critical principle for certain stacks. (also needed when you dive into AI stacks/infra...)

antonvs|3 months ago

Kubernetes can be a godsend at larger orgs.

We have six dev teams and are just about done with migrating to k8s. It's an immense improvement over what we had before.

It's a version of Greenspun's tenth rule: "Any sufficiently complicated distributed system contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Kubernetes."

nrhrjrjrjtntbt|3 months ago

How does this differ from the tooling that lets you build containers from nix?

dlahoda|3 months ago

seems similar to this

https://github.com/pdtpartners/nix-snapshotter

so kind of allowing pull images from nix store, mounting shared host nix store per node into each container, incremental fast rebuilds, generating basic pod configs are good things.

and local, ci and remote runs same flows and envs.

ronef|3 months ago

Jotting down a few quick thoughts here but we can totally go deep. This is something Michael Brantley started working on a few months ago to test out how to make it super easy to ease and leverage existing Nix & Flox architecture. One of the core differences from my quick perspective is that it specifically leverages the unique way that Flox environments are rendered without performing a nix evaluation, making it safe and optimally performant for the k8s node to realize the packages directly on the node, outside of a container.

CuriouslyC|3 months ago

Too bad this isn't open source, I'm 3/4ths of the way through building pretty much this exact product in order to support my actual products.

natebc|3 months ago

Is it not GPL?

The license file in their github seems to indicate that it is. https://github.com/flox/flox?tab=GPL-2.0-1-ov-file

nixosbestos|3 months ago

So, nix-snapshotter? Also, Flox going all in on "environments" seems like such a choice. I'm sure that Flox is not encouraging shipping a binary-in-a-devshell to Prod, so it seems an interesting branding decision.

It's hard for me to understand if I should be excited about this. I think companies do themselves such huge disservices from not being transparent to the nerds that WILL be the ones helping choose/implement these things. Instead of the current feeling I have, there could be three sentences that explains what Flox is offering here beyond what *anyone* can go do right now with nix-snapshotter.

If it's ecosystem stuff (you get Flox's CI, or CLI, or whatever else), that's not very well sold to me on the landing page. Otherwise I'm feeling left empty-handed.

jeremy_flox|3 months ago

Totally valid - we buried the lede here. Quick version:

Not nix-snapshotter because we skip Nix eval entirely and get way better cache sharing across unrelated workloads (quantized catalog means everything shares base deps). On "environments": these aren't devshells-as-prod, they're the actual runtime; same as 'flox activate' works everywhere. You're shipping a declarative, hash-pinned runtime that happens to also work great in dev/CI

And yeah, we should have been upfront that this is alpha and we're planning to open source it after vetting at KubeCon.

You're right that we're doing ourselves a disservice not being transparent with the technical crowd. What specific technical details would help you evaluate this?

yencabulator|3 months ago

This is poor messaging, trying to latch on to a "serverless" style buzzword without matching its meaning:

> Kubernetes, Uncontained

> Are you replacing containers?

> No. Kubernetes still runs containers.

jeremy_flox|3 months ago

Jeremy from Flox, here, I want to chime in here so Ron can be with his family, even though he will no doubt be right back on here:

Re: Relationship to nix-snapshotter and prior art This is original work, though very much built on prior innovations. Our approach hooks into the upstream containerd runc shim to pull the FloxHub-managed environment and bind-mount the closure at startup. The key distinction is that we use how Flox environments are rendered to avoid Nix evaluation entirely, making it safe and fast for a k8s node to realize packages directly on the node. Less about images and containers, per se, and more out bringing the power of Flox and Nix at the buildtime end to the runtime end of SDLC.

The cache story is surprisingly strong: nix store paths effectively behave like layers in the node’s registry, but with dramatically higher hit rates -- often across entirely unrelated pod deployments. Because all pods rely on the same underlying system libraries drawn from the “quantized” Flox catalog, different environments naturally share glibc, core utilities, and common dependencies, where traditional containers typically share nothing.

Tools like nix-snapshotter, Nixery, and others have pioneered this space and we're grateful for that work. This rising "post-Docker" tide raises all ships.

Re: Open Source The software is brand new -- only slightly older than Ron’s baby -- and currently in alpha. KubeCon was our first opportunity for broad feedback, and we uncovered a few issues we’re still addressing. Our intent is to open-source the project once we’ve fully vetted the approach, ideally in the coming weeks.

Yes, we launched early and the product is imperfect, but we’re doing so transparently and with a commitment to getting it right and releasing it to the community, we will continue to release early and often.

Re: Abstraction depth concerns I appreciate @rootnod3’s point about deeper abstractions complicating debugging. We’re thinking hard about how to keep things simple for people who need to run and fix systems quickly. It’s encouraging to see the broader ecosystem—like FreeBSD—lean further into reproducibility, especially as AI-centric stacks make this increasingly important.

Re: Nix vs traditional approaches Skilled Dockerfile authors can achieve great caching results -- and you can pin and you can prune registries, etc -- but our goal is to make these best practices the default. Nix enables finer-grained caching and a universal packaging format for building and consuming open source software.

We see intrinsic value in Flox environments -- whether on the CLI, k8s, Nomad down the road, or other platforms. Our aim is for Flox environments to be as universal and natural as Nix packages themselves -- essentially extending “flox activate” into the k8s world.

We likewise got a ton of valuable feedback at KubeCon, most of which was validating, all of which was very inline with this conversation.

robinhoodexe|3 months ago

First, congrats on the release. I’ve looked at flox and devenv for nixifying our container builds. Our distribution of languages is about 40/30/20/10 of Python, F#, R and nodejs.

A dilemma I’m facing is that the win from nix in terms of faster builds and smaller images would be largely from python and R images (where the average size is often 1Gi or larger). However, the developers that use Python or R are less likely to “get” the point of Nix and might have a steeper learning curve than F# developers (where the builds are quite efficient).

That was the context, my question is, how’s the integration with Flox and R/RStudio? I know there’s Rix[1] for managing R packages with Nix.

[1] https://github.com/ropensci/rix

nickysielicki|3 months ago

What constraints/coordination exists with this, in terms of host driver support? What enforces that Nix does not attempt to use a newer cuda toolkit on a host with an older cuda driver?

jeremy_flox|3 months ago

You pin the CUDA toolkit version compatible with your driver; manifest.lock ensures zero drift.

Driver version is host-managed (stable), toolkit is hash-pinned (stable). No drift on either side.

Initial selection matters (pick compatible versions, nothing automatic will stop you from initially installing the wrong thing), but once pinned, and unless/until you change it, you get the same toolkit forever regardless of catalog updates or time passing.

the_real_cher|3 months ago

[deleted]

CuriouslyC|3 months ago

You can say what you want about Kube (it's a bit of a necessary evil for the people that need it), but keep Nix's name out yo damn mouth. It's for real.

48 comments