top | item 17524086

XARs: An efficient system for self-contained executables

246 points| terrelln | 7 years ago |code.fb.com | reply

120 comments

order
[+] 0xbadcafebee|7 years ago|reply
I would use this if it didn't depend on OS-specific features. Squashfs is not portable to Windows, unless you extract it to disk.

I actually prefer the jar/Tomcat model, where the read-only image gets distributed to servers, and when you run the app the image gets unpacked to disk as needed. You could also write I/O wrappers that would obviate the need to extract them to disk, and you could even make compression optional to reduce performance hits.

It seems like all you really need is a virtual filesystem implemented as a userspace i/o wrapper. Basically FUSE but only for the one app. There's no need for the FUSE kernel shim because only the application is writing to its own virtual filesystem. So this would work on any operating system that supported applications that can overload system calls.

For example, I would start with this project http://avf.sourceforge.net/ and modify it to run apps bundled with itself. With FUSE installed, other apps could interact with its virtual filesystem, but without FUSE, it could still access its own virtual filesystem in an archive. I would then extend it by shimming in a copy-on-write filesystem to stack modifications in a secondary archive.

[+] twsted|7 years ago|reply
> I would use this if it didn't depend on OS-specific features.

I always make the same mistake to assume that none would deploy something on windows for a server–side environment.

[+] ctur|7 years ago|reply
I agree it is a bummer that FUSE doesn't directly work on Windows, but it should be doable -- we would love for someone to figure out the best way to do this on Windows. Happy to collaborate with anyone who'd like to make this a reality.
[+] ctur|7 years ago|reply
Hi, I'm Chip, one of the authors of the blog post and XAR itself. Happy to answer any questions anyone may have about how XARs work, the way we use them, or the motivations that drove their development.
[+] shock|7 years ago|reply
Hi Chip, thanks for releasing this as open source. From a quick look XARs seem pretty similar to AppImages in the sense they both use an executable preamble and a squashfs so I'm wondering which would be a better choice for me to distribute my Python apps, and why. Thanks!
[+] raghava|7 years ago|reply
Hi Chip!

Did you evaluate flatpak/appimage/snap before developing XARs? If so, what were the shortcomings that you noticed in them?

I have found reasonably good success with snap on Python and node apps, but am not an expert on them. Just want to know from other practitioners about any gotchas that others might have studied/stumbled upon.

Thanks!

[+] mmastrac|7 years ago|reply
Awesome work. Did your team evaluate creating a virtual filesystem that could process the SquashFS images without involving the kernel? Having completely independent executables that could run on _any_ system with zero additional install would be sweet.

To clarify - a stub in each XAR would act as a filesystem driver and intercept calls to open/read/etc, redirecting them to the internal data blob.

Edit: I see your comment below which answers this! https://news.ycombinator.com/item?id=17524910

[+] jwatte|7 years ago|reply
Chip, thanks for sharing!

Could you have solved the same problem ("deploy apps with dependencies in a single file") using snaps? (snapcraft.io)

[+] jkingsbery|7 years ago|reply
How do you pronounce "XAR"? "Ex-AR"? "Sar"? "Shar"?
[+] oahayder|7 years ago|reply
Hi Chip. How does facebook deploy XARs? Is an advantage over docker containers not having to run the docker daemon on every machine?
[+] rgovostes|7 years ago|reply
Probably too late now, but xar already stands for "eXtensible ARchiver" and is a file format used on macOS in some package installers. It's notable for having an embedded XML "table of contents" that describes metadata of the archived files, so new fields can easily be added while maintaining backwards compatibility. (Compared to, say, the zip file format which does not even specify how to store Unix file modes.) https://en.wikipedia.org/wiki/Xar_(archiver)
[+] JohnDotAwesome|7 years ago|reply
Names are hard. My name is John. Programmers get really confused when I tell them that. "Did you know there's another programmer named John? You probably should have researched names before you decided to go with that one"
[+] berti|7 years ago|reply
Yes, I was really confused by the AppImage comparisons in this thread until I realised this wasn't the XAR I already knew. It's hard to imagine they couldn't have easily checked the name before claiming it. The (Darwin) XAR page on Wikipedia was created in 2006 [1]...

[1] https://en.wikipedia.org/wiki/Xar_(archiver)

[+] rwmj|7 years ago|reply
I'm not really a fan of containers, but I read this and thought "why not containers"?

The page mentions cryptically "They could almost be thought of as a self-executing container without the virtualization". The "self-executing" bit makes sense - you don't have to remember to type "docker". "without the virtualization" doesn't make sense unless they mean without cgroups or are talking about Kata Containers.

[+] secure|7 years ago|reply
Cool idea! Is there any particular reason to use SquashFS via FUSE instead of via the Linux kernel driver?

Slightly related: we also recently switched to SquashFS for the gokrazy.org’s root file systems.

If you’re curious about how SquashFS works under the hood, check out https://github.com/gokrazy/internal/blob/master/squashfs/wri.... I also intend to publish a poster about it at some point.

[+] ctur|7 years ago|reply
We actually started with using "real" squashfs files. This had three main disadvantages:

- We had to maintain our own setuid executable to perform the loopback setup and mount (rather than relying on the far more tested and secure open source fusermount setuid binary that all FUSE file systems rely on) - Getting loopback devices to behave inside of containers (generally cgroup and mount namespace containers) was a little tricky at times in some of our environments - We didn't want to have a huge number of extra loopback devices on every host in our fleet

In fact, after implementing the loopback-based filesystem version, we almost abandoned XAR as the downside of the security considerations and in-container behavior wasn't ideal. The open source squashfuse FUSE filesystem really is what made it possible.

Another side benefit is we could iterate far faster with squashfuse -- this let us fix some performance issues, add idle unmounting, and implement zstd-based squashfs files, and then deploy that to our fleet, faster than we could deploy a kernel to 100% of hosts.

[+] jillesvangurp|7 years ago|reply
Not a bad idea. I wonder how this compares to ubuntu's snaps. Seems like a good idea to me but I've not really seen it used much yet.

On OS-X apps have been distributed in a .app form for ages. It's very uncommon for OS X apps to have installers or a more complicated installation (and uninstallation) than drag and drop.

So, good idea and it kind of fixes a big issue where most linux distributions seem to insist on dll hell with just about anything littering the file system with cruft and just about every interpreter out there reinventing ways to create virtual environments.

[+] juliangoldsmith|7 years ago|reply
This all reminds me a bit of Tiny Core Linux. IIRC, it uses SquashFS images for all its packages, mounts them in a specific spot, then uses either symlinks or UnionFS to put everything together.
[+] FRidh|7 years ago|reply
Yet another format for self-contained executables, and one that looks pretty similar to the already existing AppImage.

Note that Nix users can use `nix-bundle` to create AppImages of all the software in Nixpkgs, which is according to Repology one of the largest and freshest package sets: https://repology.org/statistics

[+] fenesiistvan|7 years ago|reply
I am a windows developer and the single thing that stops me porting my apps to linux is an easy to use deploy method. Is there some good way to handle this task without to spend months learning about linux administration like shell scripts, finding the best place for configs, logs on different linux distros, daemons setup, etc. Something simple and distro independent would be fine...
[+] rwmj|7 years ago|reply
I am a Linux developer and the single thing that stops me porting my apps to Windows is an easy to use deploy method. Is there some good way to handle this task without to spend months learning about Windows administration like installers, MSI, the registry, logging, services?

The non-flippant answer is to just provide the source and let the distros package it for you. It's a different model. Linux users want to get their software through an integrated package manager, and volunteers will take your software and do all the work needed to make that happen.

[+] juliangoldsmith|7 years ago|reply
Is your application open source?

If so, I wouldn't worry about packaging for every distro. Just make sure that your application isn't difficult to build, and most things (paths, etc.) are configurable. The config path can be handled with a command line switch.

Once you've gotten your application so that it can be built easily, I'd only really worry about packaging it for your distro of choice. If people are interested in your application, it'll get packaged for their distros.

If your application is proprietary, I wouldn't even worry about packaging it yet. Getting most Linux users on non-essential proprietary software will be an uphill battle.

[+] shock|7 years ago|reply
AppImage⁰ seems to be what you're looking for. XARs maybe, but I don't have experience with them to recommend them.

⓪ - https://appimage.org/

[+] nvivo|7 years ago|reply
I must say as a long time windows developer, .net core on linux is much simpler than on windows. I'm moving everything I have from windows to Linux and the experience, simplicity, speed, stability, etc are much better on linux.

I guess of you're working with gui that's a different story. But for .net websites and background servers, simply use docker on linux and never lool back. And it's much simpler than docker for windows too.

[+] sjellis|7 years ago|reply
For desktop applications, Flatpak does this. Otherwise, containers or the new "portable services" feature of systemd. Thanks to systemd, there are standardised ways of doing a lot of things, but yeah, distributions do vary, so you really need an abstraction, or target just a few of the popular distributions.
[+] jwatte|7 years ago|reply
What you want is snapcraft.io
[+] nikolay|7 years ago|reply
Is Facebook an NIHS (Not Invented Here Syndrome) sufferer?
[+] lttlrck|7 years ago|reply
Indirectly this proves to be a useful discovery mechanism for me - when tools crop up on HN I think ‘hey that’s interesting’, then oftentimes when I read the comments I find there are numerous existing solutions I had never heard of along with helpful links and insightful info :-)

It’s brilliant. It’s one of the reasons I value this site so much.

[+] dagenix|7 years ago|reply
Does the XAR file containing the Python executable itself, or, does running it rely on having Python installed on the host already?
[+] jarvuschris|7 years ago|reply
Have you looked into Habitat? It provides a similar result with a complete build workflow that works across technologies and platforms: https://www.habitat.sh/

There's a rapidly growing library of libraries and services packaged with it: https://bldr.habitat.sh

Its build artifacts can be exported to a number of formats including container images and tarballs, maybe a XAR exporter could be built: https://github.com/habitat-sh/habitat/tree/master/components...

[+] JohnDotAwesome|7 years ago|reply
Seems like Habitat (which looks awesome by the way) relies on Docker. Which, if you consider performance heuristics in the article (size, cold/hot start time), may be a non-starter for what they're trying to do.
[+] ASinclair|7 years ago|reply
How similar is this to Google PAR/SAR executables for Python and Bash scripts respectively?
[+] terrelln|7 years ago|reply
Facebook's PAR is a self-extracting zip file, I assume Google's is similar. XARs are self-mounting SquashFS archives (a compressed read only filesystem). This means that XARs don't have to be extracted to a temporary directory to run, they can run in place. Zip files have to be completely extracted before running, but SquashFS decompresses pages on the fly, so startup times are much faster (especially with zstd compression).
[+] wesleyy|7 years ago|reply
I don't think Google's implementation of their hermetic par files are open source.
[+] aumerle|7 years ago|reply
It's somewhat counter-intuitive that start times with XAR are lower than start times without it. Is fuse faster than a kernel filesystem? Even with compression?
[+] ctur|7 years ago|reply
FUSE isn't generally lighter weight than a filesystem but it can be relatively competitive for simple use cases like a read-only filesystem. Additionally, squashfs lets you pack metadata and data very tightly, and since it is a readonly filesystem, has some optimizations normal filesystems can't (how data is placed, overhead of managing metadata operations, etc). Also squashfs lets you choose how the files are laid out and compressed so that all files of a certain type, such as all .pyc files, are close together, which increases compression ratio and reduces overhead for subsequent file accesses (i.e., can reduce random disk or flash IO).

In practice the timings of XAR vs filesystem are close enough to be "in the noise" -- it's when compared to PEX or PARs that the difference is quite large.

[+] terrelln|7 years ago|reply
I spent some time today investigating what exactly is causing the difference between native and XAR start times. I confirmed the culprit is `pkg_resources.load_entry_point()`. Modern installations using wheels should avoid this overhead, and those native installations will be slightly faster than XARs:

black: 0.171 s (vs 0.208 for XAR) jupyter: 0.165 s (vs 0.179 s for XAR)

My test setup used the older loading method because "pip install ." won't install wheels if the wheel package isn't installed in the virtualenv.

[+] terrelln|7 years ago|reply
Admittedly I haven’t profiled this yet, but my guess is it is a constant overhead of setting up pkg_resources that the native code uses to load the entry point.

The test against native start speed was hot, so the pages required were already in the page cache, so the filesystem shouldn’t matter.

[+] TheAceOfHearts|7 years ago|reply
I'm on mobile, but do you have an example of bundling a node app somewhere?

I'm curious how it compares to using something like pkg [0].

[0] https://github.com/zeit/pkg

[+] terrelln|7 years ago|reply
We currently don't have a nice open source API for building node apps, but would welcome PRs that get us in this direction!

There are two ways to build a node app using the XAR builder tools. 1. Use the `make_xar` tool which will create a XAR from a directory and takes an optional script to run on execution. 2. Use the XAR builder library to make a XAR builder that is specialized for building node apps.

[+] dcgudeman|7 years ago|reply
Requirements

Python >= 2.7.11 & >= 3.5

you need both?

[+] russellbeattie|7 years ago|reply
Facebook spent a decade contributing virtually nothing to open source, now they're flooding the world with random projects of varying and questionable value - most developed as a result of Facebook's severe N.I.H. attitude. I'm honestly not sure which is worse.
[+] saagarjha|7 years ago|reply
Kind of an unfortunate name, considering that xar (eXtensible ARchive) is already a thing: https://en.wikipedia.org/wiki/Xar_(archiver)
[+] nine_k|7 years ago|reply
I'd hazard to say that almost any 3-letter abbreviation has been already taken, many of the easy-to-pronounce ones, multiple times.